Abstract
In this paper we propose a natural approach to characterizing genomic sequences, based on occurrences of fixed length words (strings over the alphabet {A,C,G,T}) from a sufficiently large set W of arbitrary (in general case) words. According to our approach, any genomic sequence can be characterized by a histogram of frequencies of imperfect matching of words from the set W that is called a compositional spectrum (CS). The specificity of CSs is manifest in a reasonable similarity of spectra obtained on different stretches of the same genome and, simultaneously, in a broad range of dissimilarities between spectral characteristics of different genomes. The proposed approach may have various applications in intra- and intergenomic sequence comparisons.
| Original language | English |
|---|---|
| Pages (from-to) | 447-457 |
| Number of pages | 11 |
| Journal | Physica A: Statistical Mechanics and its Applications |
| Volume | 312 |
| Issue number | 3-4 |
| DOIs | |
| State | Published - 15 Sep 2002 |
Keywords
- Compositional spectra
- DNA sequences
- Imperfect matching
- Sequence comparisons
- Set of words
ASJC Scopus subject areas
- Statistics and Probability
- Condensed Matter Physics
Fingerprint
Dive into the research topics of 'Compositional spectrum - Revealing patterns for genomic sequence characterization and comparison'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver