Compositional spectrum - Revealing patterns for genomic sequence characterization and comparison

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper we propose a natural approach to characterizing genomic sequences, based on occurrences of fixed length words (strings over the alphabet {A,C,G,T}) from a sufficiently large set W of arbitrary (in general case) words. According to our approach, any genomic sequence can be characterized by a histogram of frequencies of imperfect matching of words from the set W that is called a compositional spectrum (CS). The specificity of CSs is manifest in a reasonable similarity of spectra obtained on different stretches of the same genome and, simultaneously, in a broad range of dissimilarities between spectral characteristics of different genomes. The proposed approach may have various applications in intra- and intergenomic sequence comparisons.

Original languageEnglish
Pages (from-to)447-457
Number of pages11
JournalPhysica A: Statistical Mechanics and its Applications
Volume312
Issue number3-4
DOIs
StatePublished - 15 Sep 2002

Keywords

  • Compositional spectra
  • DNA sequences
  • Imperfect matching
  • Sequence comparisons
  • Set of words

ASJC Scopus subject areas

  • Statistics and Probability
  • Condensed Matter Physics

Fingerprint

Dive into the research topics of 'Compositional spectrum - Revealing patterns for genomic sequence characterization and comparison'. Together they form a unique fingerprint.

Cite this