We consider a string matching problem where the pattern is a template that matches many different strings with various degrees of perfection. The quality of a match is given by a penalty matrix that assigns each pair of characters a score that characterizes how well the characters match. Superfluous characters in the text and superfluous characters in the pattern may also occur and the respective penalties for such gaps in the alignment are also given by the penalty matrix. For a text T of length n, and a template P of length m, we wish to find the best alignment of T with Pn, which is the concatenation of n copies of P, (m will typically be much smaller than n). Such an alignment can simply be obtained by solving a dynamic programming problem of size O(n2m), and ignoring the periodic character of Pn. We show that the structure of Pn can be exploited and the problem reduced to essentially solving a dynamic programming of size O(mn). If the complexity of computing gap penalties is O(1), (which is frequently the case), our algorithm runs in O(mn) time. The problem was motivated by a protein structure problem.
|Title of host publication||Combinatorial Pattern Matching - 3rd Annual Symposium, Proceedings|
|Editors||Alberto Apostolico, Maxime Crochemore, Zvi Galil, Zvi Galil, Udi Manber|
|Number of pages||10|
|State||Published - 1992|
|Event||3rd Annual Symposium on Combinatorial Pattern Matching, 1992 - Tucson, United States|
Duration: 29 Apr 1992 → 1 May 1992
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||3rd Annual Symposium on Combinatorial Pattern Matching, 1992|
|Period||29/04/92 → 1/05/92|
Bibliographical noteFunding Information:
String matching and its many generalizations is a widely studied problem in computer science. One possible generalization that has been researched is approximate string matching - finding occurrences of a pattern in a text where differences (insertions and deletions) are allowed and matches may be defined by a function, with values in some range (-r, r), which specifies how well a character from the pattern "matches" a given character in the text. (Positive values indicate "favorable matches", while negative values indicate "unfavorable matches" .) Given a text T of length n, and a pattern P of length m, in the exact string matching problem one finds all the locations (ti) in the text such that P = titi+l ... ti+,~-t. When differences are allowed, however, every location in the text matches the pattern with some differences. A clarification of the definition of a "match" of the pattern is therefore needed. Most known algorithms for approximate string matching have two basic steps. In the first step each substring of the text receives a score, which reflects the quality of the match between the pattern and the given substring. In * Partially supported by NSF grant CCR-8908286. ** Partially supported by NSF grant CCR-9110255 and the New York State Science and Technology Foundation Center for Advanced Technology.
© Springer-Verlag Berlin Heidelberg 1992.
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science (all)