Tracing the most parsimonious indel history

Sagi Snir, Lior Pachter

Research output: Contribution to journalArticlepeer-review


Sequence alignment (the grouping of homologous bases into one column) is fundamental to almost any task in comparative genomics. This translates to positing gaps in the genomic sequences to account for events of insertions and deletions (indels). The interrelationship between sequence alignment and phylogenetic reconstruction has drawn substantial attention recently with works showing the significance of differences in alignments. One of the plausible approaches in this direction is to grade the suitability of a tree to an associated alignment and vice verse. We here present a combinatorial (as opposed to statistical) approach based on the indel history. We show - both by simulations and by using real biological data from the Encyclopedia of DNA Elements (ENCODE) - that this criterion is sound. The novelty of our approach is the distinguishing between insertions and deletions, and augmenting the analysis with a dimension of "depth," extending it from the sequence space to the phylogenetic space. Using this approach, we perform a comprehensive study of indel characteristic behavior among mammals in both coding and non-coding regions. Our results show significant differences in indel patterns between coding and non-coding regions. We also show other characteristic patterns of indel evolution in the depth of the underlying phylogeny.

Original languageEnglish
Pages (from-to)967-986
Number of pages20
JournalJournal of Computational Biology
Issue number8
StatePublished - 1 Aug 2011


  • algorithms
  • biology
  • computational molecular biology evolution

ASJC Scopus subject areas

  • Computational Mathematics
  • Genetics
  • Molecular Biology
  • Computational Theory and Mathematics
  • Modeling and Simulation


Dive into the research topics of 'Tracing the most parsimonious indel history'. Together they form a unique fingerprint.

Cite this