Text Indexing and Dictionary Matching with One Error

Amihood Amir, Dmitry Keselman, Gad M. Landau, Moshe Lewenstein, Noa Lewenstein, Michael Rodeh

Research output: Contribution to journalArticlepeer-review

Abstract

The indexing problem is where a text is preprocessed and subsequent queries of the form "Find all occurrences of pattern P in the text" are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form "Find all occurrences of dictionary patterns in text T" are answered in time proportional to the length of the text and the number of occurrences. There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location. In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(n log2 n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(m log n log log n + tocc), where m is the query string size and tocc is the number of occurrences. Our query time for the dictionary matching problem is O(n log3 d log log d + tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets.

Original languageEnglish
Pages (from-to)309-325
Number of pages17
JournalJournal of Algorithms
Volume37
Issue number2
DOIs
StatePublished - Nov 2000

Bibliographical note

Funding Information:
1A preliminary version of this paper appeared in WADS’99 5 . 2 Partially supported by NSF Grant CCR-96-10170 and BSF Grant 96-00509. 3Partially supported by NSF Grant CCR-9610238 and by the Israel Science Foundation founded by the Israeli Academy of Sciences and Humanities. 4Supported by an Eshkol Fellowship from the Israel Ministry of Science and the Arts. 5Partially supported by the Israel Ministry of Science and the Arts Grant 8560.

ASJC Scopus subject areas

  • Control and Optimization
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Text Indexing and Dictionary Matching with One Error'. Together they form a unique fingerprint.

Cite this