Permutation pattern discovery in biosequences

Revital Eres, Gad M. Landau, Laxmi Parida

Research output: Contribution to journalArticlepeer-review

Abstract

Functionally related genes often appear in each other's neighborhood on the genome; however, the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper, we address the problem of automatically discovering clusters of entities, be they genes or domains: we formalize the abstract problem as a discovery problem called the π pattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E. Coli protein sequences.

Original languageEnglish
Pages (from-to)1050-1060
Number of pages11
JournalJournal of Computational Biology
Volume11
Issue number6
DOIs
StatePublished - 2004

Keywords

  • Clusters
  • Combinatorial algorithms on words
  • Data mining
  • Design and analysis of algorithms
  • Discovery
  • Motifs
  • Patterns

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Permutation pattern discovery in biosequences'. Together they form a unique fingerprint.

Cite this