Gene partitioning is a very common task in genomics, based on several criteria such as gene function, homology, interactions, and more. Given two such partitions, a metric to compare them is called for. One such metric is based on multi symmetric difference and elements are removed from both partitions until identity is reached. While such a task can be done accurately by a maximum weight bipartite matching, in common settings in comparative genomics, the standard algorithm to solve this problem might become impractical. In previous works we have studied the universal pacemaker (UPM) where genes are clustered according to mutation rate correlation, and suggested a very fast and greedy procedure for comparing partitions. This procedure is guaranteed to provide a poor approximation ratio of 1/2 under arbitrary inputs. In this work we give a probabilistic analysis of this procedure under a common and natural stochastic environment. We show that under mild size requirements, and a sound model assumption, this procedure returns the correct result with high probability. Furthermore, we show that in the context of the UPM, this natural requirement holds automatically, rendering statistical consistency of this fast greedy procedure. We also discuss the application of this procedure in the comparative genomics rudimentary task of gene orthology where such a solution is imperative.
|Title of host publication
|Bioinformatics Research and Applications - 15th International Symposium, ISBRA 2019, Proceedings
|Min Li, Zhipeng Cai, Pavel Skums
|Number of pages
|Published - 2019
|15th International Symposium on Bioinformatics Research and Applications, ISBRA 2019 - Barcelona, Spain
Duration: 3 Jun 2019 → 6 Jun 2019
|Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
|15th International Symposium on Bioinformatics Research and Applications, ISBRA 2019
|3/06/19 → 6/06/19
Bibliographical notePublisher Copyright:
© 2019, Springer Nature Switzerland AG.
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science