Gene proximity analysis across whole genomes via PQ trees

Research output: Contribution to journalArticlepeer-review

Abstract

Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal π pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases.

Original languageEnglish
Pages (from-to)1289-1306
Number of pages18
JournalJournal of Computational Biology
Volume12
Issue number10
DOIs
StatePublished - Dec 2005

Keywords

  • Clusters
  • Comparative genomics
  • Evolutionary analysis
  • Motifs
  • PQ trees
  • Pattern discovery
  • Patterns
  • Permutation patterns
  • Whole genome analysis

ASJC Scopus subject areas

  • Computational Mathematics
  • Genetics
  • Molecular Biology
  • Computational Theory and Mathematics
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Gene proximity analysis across whole genomes via PQ trees'. Together they form a unique fingerprint.

Cite this