## Abstract

A perfect tandem repeat within a string S is a substring r = r_{1},… r_{2l} of S, for which r_{1}… r_{1} = r_{l+1} … r_{2l}. An approximate tandem repeat is a substring r =r_{1},…, r_{l1 },… rl, for which r_{1},…, r_{1l } and r_{l1+1},… r_{l} are similar. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats, r = ūȗ, for which the Hamming distance of u and ȗ is at most k in O(nklog(n/k)) time, or all those for which the edit distance of ū and ȗ is at most k, in O(nk log k log n) time.

Original language | English |
---|---|

Title of host publication | Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings |

Editors | Alberto Apostolico, Alberto Apostolico, Maxime Crochemore , Zvi Galil, Zvi Galil, Udi Manber |

Publisher | Springer Verlag |

Pages | 120-133 |

Number of pages | 14 |

ISBN (Print) | 9783540567646 |

DOIs | |

State | Published - 1993 |

Externally published | Yes |

Event | Conference of the European Society for Fuzzy Logic and Technology, EUSFLAT 2017 and 16th International Workshop on Intuitionistic Fuzzy Sets and Generalized Nets, IWIFSGN 2017 - Warsaw, Poland Duration: 11 Sep 2017 → 15 Sep 2017 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 684 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Conference

Conference | Conference of the European Society for Fuzzy Logic and Technology, EUSFLAT 2017 and 16th International Workshop on Intuitionistic Fuzzy Sets and Generalized Nets, IWIFSGN 2017 |
---|---|

Country/Territory | Poland |

City | Warsaw |

Period | 11/09/17 → 15/09/17 |

### Bibliographical note

Funding Information:The perfect tandem repeat problem is a well studied problem. Main and Lorentz \[ML-84\]p resent an O(nlog n) algorithm, which reports all perfect tandem repeats and Apostolico \[Ap-92\] describes an optimal speed-up parallel algorithm for the problem. Motivations for the exact repeat problem can be found in research in formal languages (see a survey in \[ML-85\]). Important motivations for the approximate tandem repeat problem are found in different areas. In molecular Biology, tandem repeats play an important role in both DNA and protein sequences. At the DNA level they act as "hot spots" that enable these regions to more rapidly conform to environmental changes. Such repeats are also frequent in bacterial proteins, where their function is less understood. The repeats in these applications are not exact. One can use different criterions to measure the similarity of the repeats. In this paper we consider two simple measures of similarity. While these measures are suitable for several of the above applications, they also lend themselves to the design of fast algorithms. Given a string S and an integer k the algorithm finds all non empty substrings r = tiff, for which: (i) the Hamming distance of fi and fi is at most k; or (it) the edit distance of fi and fi is at most k. The Hamming distance of u and v is defined as the number of substitutions necessary to get v from u, (u and v must be of same length). The edit distance, as defined by Levenshtein \[L-66\]i,s the minimum number of deletions in u, substitutions or insertions in v necessary to get v from u. In the case of the Hamming distance * e-mail: landau@pucs2.poly.edu. Partially supported by the New York State Science and Technology Foundation Center for Advanced Technology. ** e-mail: jps@pucs4.poly.edu. Partially supported by NSF grant CCR-9110255 and the New York State Science and Technology Foundation Center for Advanced Technology.

Publisher Copyright:

© Springer-Verlag Berlin Heidelberg 1993.

## ASJC Scopus subject areas

- Theoretical Computer Science
- Computer Science (all)