Greedy Partition Distance Under Stochastic Models - Analytic Results

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Gene partitioning is a very common task in genomics, based on several criteria such as gene function, homology, interactions, and more. Given two such partitions, a metric to compare them is called for. One such metric is based on multi symmetric difference and elements are removed from both partitions until identity is reached. While such a task can be done accurately by a maximum weight bipartite matching, in common settings in comparative genomics, the standard algorithm to solve this problem might become impractical. In previous works we have studied the universal pacemaker (UPM) where genes are clustered according to mutation rate correlation, and suggested a very fast and greedy procedure for comparing partitions. This procedure is guaranteed to provide a poor approximation ratio of 1/2 under arbitrary inputs. In this work we give a probabilistic analysis of this procedure under a common and natural stochastic environment. We show that under mild size requirements, and a sound model assumption, this procedure returns the correct result with high probability. Furthermore, we show that in the context of the UPM, this natural requirement holds automatically, rendering statistical consistency of this fast greedy procedure. We also discuss the application of this procedure in the comparative genomics rudimentary task of gene orthology where such a solution is imperative.

Original languageEnglish
Title of host publicationBioinformatics Research and Applications - 15th International Symposium, ISBRA 2019, Proceedings
EditorsMin Li, Zhipeng Cai, Pavel Skums
PublisherSpringer Verlag
Pages257-269
Number of pages13
ISBN (Print)9783030202415
DOIs
StatePublished - 2019
Event15th International Symposium on Bioinformatics Research and Applications, ISBRA 2019 - Barcelona, Spain
Duration: 3 Jun 20196 Jun 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11490 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Symposium on Bioinformatics Research and Applications, ISBRA 2019
Country/TerritorySpain
CityBarcelona
Period3/06/196/06/19

Bibliographical note

Publisher Copyright:
© 2019, Springer Nature Switzerland AG.

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Greedy Partition Distance Under Stochastic Models - Analytic Results'. Together they form a unique fingerprint.

Cite this