Abstract
Motivation: Phylogenies - the evolutionary histories of groups of organisms - play a major role in representing relationships among biological entities. Although many biological processes can be effectively modeled as tree-like relationships, others, such as hybrid speciation and horizontal gene transfer (HGT), result in networks, rather than trees, of relationships. Hybrid speciation is a significant evolutionary mechanism in plants, fish and other groups of species. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Maximum parsimony is one of the most commonly used criteria for phylogenetic tree inference. Roughly speaking, inference based on this criterion seeks the tree that minimizes the amount of evolution. In 1990, Jotun Hein proposed using this criterion for inferring the evolution of sequences subject to recombination. Preliminary results on small synthetic datasets. Nakhleh et al. (2005) demonstrated the criterion's application to phylogenetic network reconstruction in general and HGT detection in particular. However, the naive algorithms used by the authors are inapplicable to large datasets due to their demanding computational requirements. Further, no rigorous theoretical analysis of computing the criterion was given, nor was it tested on biological data. Results: In the present work we prove that the problem of scoring the parsimony of a phylogenetic network is NP-hard and provide an improved fixed parameter tractable algorithm for it. Further, we devise efficient heuristics for parsimony-based reconstruction of phylogenetic networks. We test our methods on both synthetic and biological data (rbcL gene in bacteria) and obtain very promising results.
Original language | English |
---|---|
Pages (from-to) | e123-e128 |
Journal | Bioinformatics |
Volume | 23 |
Issue number | 2 |
DOIs | |
State | Published - 2007 |
Externally published | Yes |
Bibliographical note
Funding Information:This work was supported in part by the Rice Terascale Cluster funded by NSF under grant EIA-0216467, Intel, and HP. S. S. was supported in part by NSF grant CCR-0105533.
ASJC Scopus subject areas
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics