TY - GEN
T1 - SplittingHeirs
T2 - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
AU - Climer, Sharlee
AU - Templeton, Alan R.
AU - Zhang, Weixiong
PY - 2010
Y1 - 2010
N2 - Phasing genotype data to identify the composite haplotype pairs is a widely-studied problem due to its value for understanding genetic contributions to diseases, population genetics research, and other significant endeavors. The accuracy of the phasing is crucial as identification of haplotypes is frequently the first step of expensive and vitally important studies. We present a combinatorial approach to this problem which we call SplittingHeirs. This approach is biologically motivated as it is based on three widely accepted principles: there tend to be relatively few unique haplotypes within a population, there tend to be clusters of haplotypes that are similar to each other, and some haplotypes are relatively common. We have tested SplittingHeirs, along with several popular existing phasing methods including PHASE, HAP, EM, and Pure Parsimony, on seven sets of haplotype data for which the true phase is known. Our method yields the highest accuracy obtainable by these methods in all cases. Furthermore, SplittingHeirs is robust and had higher accuracy than any of the other approaches for the two datasets with high recombination rates. The success of SplittingHeirs validates the assumptions made by the dense graph model and highlights the benefits of finding globally optimal solutions.
AB - Phasing genotype data to identify the composite haplotype pairs is a widely-studied problem due to its value for understanding genetic contributions to diseases, population genetics research, and other significant endeavors. The accuracy of the phasing is crucial as identification of haplotypes is frequently the first step of expensive and vitally important studies. We present a combinatorial approach to this problem which we call SplittingHeirs. This approach is biologically motivated as it is based on three widely accepted principles: there tend to be relatively few unique haplotypes within a population, there tend to be clusters of haplotypes that are similar to each other, and some haplotypes are relatively common. We have tested SplittingHeirs, along with several popular existing phasing methods including PHASE, HAP, EM, and Pure Parsimony, on seven sets of haplotype data for which the true phase is known. Our method yields the highest accuracy obtainable by these methods in all cases. Furthermore, SplittingHeirs is robust and had higher accuracy than any of the other approaches for the two datasets with high recombination rates. The success of SplittingHeirs validates the assumptions made by the dense graph model and highlights the benefits of finding globally optimal solutions.
UR - http://www.scopus.com/inward/record.url?scp=77958076282&partnerID=8YFLogxK
U2 - 10.1145/1854776.1854798
DO - 10.1145/1854776.1854798
M3 - Conference contribution
AN - SCOPUS:77958076282
SN - 9781450304382
T3 - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
SP - 127
EP - 136
BT - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
Y2 - 2 August 2010 through 4 August 2010
ER -