Abstract
The advent of the genomic era has produced an incredible wealth and resolution of molecular data, posing an unprecedented challenge for molecular systematics, necessitating novel techniques and paradigms. Consequently, whole genome approaches were developed to extract the evolutionary signal by taking advantage of a larger amount of data. In parallel and in light of the understanding that in prokaryotes, genome dynamics (GD) events, primarily gene gain and loss, provide a significantly richer signal than point mutations in ubiquitous housekeeping genes, GD-based approaches were suggested. However, proper modeling of these data and the processes generating them has lagged in their pace of accumulation, both because of a lack of deep understanding and because of technical difficulties. Among the central hurdles of accurate modeling of real data is the relaxation of rate constancy, particularly the untying of gain and loss rates. This relaxation violates key assumptions such as constant genome sizes, gene set, and model reversibility and has vast implications for implementation. This work presents a generic stochastic model, the two-ratio process (TRP), which encompasses and deals with these complications. As a special case, it contains the Poissonian process with different gene gain and loss rates as a form of the Birth-Death process with varying population sizes. The lack of reversibility invalidates traditional phylogenetic approaches, yielding a novel two-stage phylogenetic approach in which accurate, bidirectional parameters are first inferred for triplets and later combined by a special cherry-picking method to a complete tree. We show by algebraic techniques that this method is theoretically statistically consistent. The method implemented by the software TDDR (Triplets Directed Distances Reconstruction) was applied to synthetic data, showing an advantage over other approaches handling similar data but without the same model assumption. We also applied it to the Alignable Tight Genomic Clusters (ATGC) Database, which showed a high adequacy to the observed data. The full text of this article appears on bioRxiv.org at https://www.biorxiv.org/content/10.1101/2025.01.27.634999v1. The TDDR code is available on GitHub: https://github.com/YoavDvir/TDDR.
| Original language | English |
|---|---|
| Title of host publication | Research in Computational Molecular Biology - 29th International Conference, RECOMB 2025, Proceedings |
| Editors | Sriram Sankararaman |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 414-419 |
| Number of pages | 6 |
| ISBN (Print) | 9783031902512 |
| DOIs | |
| State | Published - 2025 |
| Event | 29th International Conference on Research in Computational Molecular Biology, RECOMB 2025 - Seoul, Korea, Republic of Duration: 26 Apr 2025 → 29 Apr 2025 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15647 LNBI |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 29th International Conference on Research in Computational Molecular Biology, RECOMB 2025 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Seoul |
| Period | 26/04/25 → 29/04/25 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Keywords
- Birth-Death Processes
- Phylogenetics
- Prokaryotic Genome Dynamics
- Statistical Consistency
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science
Fingerprint
Dive into the research topics of 'Untying Rates of Gene Gain and Loss Leads to a New Phylogenetic Approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver