TY - JOUR
T1 - Faster algorithms for optimal multiple sequence alignment based on pairwise comparisons
AU - Bilu, Yonatan
AU - Agarwal, Pankaj K.
AU - Kolodny, Rachel
N1 - Funding Information:
The authors would like to thank Chris Lee and Nati Linial for enlightening discussions, Eylon Portugaly for his help in implementing the algorithm, and the referees for many helpful comments. Y. Bilu is supported by the Dewey David Stone Postdoctoral Fellowship and UniNet EC NEST consortium contract number 12990. P.K. Agarwal is supported by US National Science Foundation (NSF) grants CCR-00-86013, EIA-98-70724, EIA-01-31905, and CCR-02-04118, and by a grant from the US-Israel Binational Science Foundation. R. Kolodny is supported by NSF grant CCR-00-86013. Part of this work was done while Y. Bilu was at the School of Engineering and Computer Science, The Hebrew University of Jerusalem, and R. Kolodny was at the Department of Computer Science, Stanford University and visiting Duke University.
PY - 2006/10
Y1 - 2006/10
N2 - Multiple Sequence Alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm.
AB - Multiple Sequence Alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm.
KW - Algorithms
KW - Dynamic programming
KW - Multiple sequence alignment
KW - Shortest path
UR - http://www.scopus.com/inward/record.url?scp=33845609285&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2006.53
DO - 10.1109/TCBB.2006.53
M3 - Article
C2 - 17085849
AN - SCOPUS:33845609285
VL - 3
SP - 408
EP - 422
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
SN - 1545-5963
IS - 4
ER -