The problem of finding repeats within a string is an important computational problem with applications in data compression and in the field of molecular biology. Both exact and inexact repeats occur frequently in the genome, and certain repeats are known to be related to human diseases. A multiple tandem repeat in a sequence S is a (periodic) substring r of S of the form r = u au′, where u (the period) is a prefix of r, u′ is a prefix of u and a ≥ 2. A run is a maximal (non-extendable) multiple tandem repeat. An approximate run is a run with errors (i.e. the repeated subsequences are similar but not identical). Many measures have been proposed that capture the similarity among all periods. We may measure the number of errors between consecutive periods, between all periods, or between each period and a consensus string. Another possible measure is the number of positions in the periods that may differ. In this talk I will survey a range of our results in this area. Various parts of this work are joint work with Maxime Crochemore, Gene Myers, Jeanette Schmidt and Dina Sokol.
|Title of host publication||String Processing and Information Retrieval - 15th International Symposium, SPIRE 2008, Proceedings|
|Editors||Andrew Turpin, Alistair Moffat, Amihood Amir|
|Number of pages||1|
|State||Published - 2008|
|Event||15th International Symposium on String Processing and Information Retrieval, SPIRE 2008 - Melbourne. VIC, Australia|
Duration: 10 Nov 2008 → 12 Nov 2008
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||15th International Symposium on String Processing and Information Retrieval, SPIRE 2008|
|Period||10/11/08 → 12/11/08|
Bibliographical notePublisher Copyright:
© Springer-Verlag Berlin Heidelberg 2008.
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science (all)