Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours  for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest.
|Number of pages||15|
|Journal||IEEE/ACM Transactions on Computational Biology and Bioinformatics|
|State||Published - 2010|
Bibliographical noteFunding Information:
The authors would like to thank very much Tandy Warnow for insightful comments on an earlier version and also Usman Roshan for providing the RBCL data. They are also very thankful to the three anonymous referees who gave many extremely helpful comments in particular regarding the simulation study and former TCBB editor in chief Dan Gusfield for comments on structure and wording. The research was done at UC Berkeley and supported by NIH Grant R01-HG02362-02 and US NSF Grant CCR-0105533. Satish Rao was supported by US NSF Award-0331494.
- Phylogenetic reconstruction
ASJC Scopus subject areas
- Applied Mathematics