Abstract
We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of d strings of length m and an integer k, we must preprocess it into a data structure to answer the following queries: Given a query string Q of length m, find all strings in the dictionary that are at Hamming distance at most k from Q. Chan and Lewenstein (CPM 2015) showed a data structure for k = 1 with optimal query time O(m/w + occ), where w is the size of a machine word and occ is the size of the output. The data structure occupies O(wd log1+ε d) extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values k. Namely, we give a data structure that has O(m/w + logk d + occ) query time and uses O(wd logk d) extra bits of space.
Original language | English |
---|---|
Title of host publication | 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 |
Editors | Igor Potapov, James Worrell, Paul Spirakis |
Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
Pages | 66:1–66:15 |
ISBN (Print) | 9783959770866 |
DOIs | |
State | Published - 1 Aug 2018 |
Event | 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 - Liverpool, United Kingdom Duration: 27 Aug 2018 → 31 Aug 2018 |
Publication series
Name | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|
Volume | 117 |
ISSN (Print) | 1868-8969 |
Conference
Conference | 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 |
---|---|
Country/Territory | United Kingdom |
City | Liverpool |
Period | 27/08/18 → 31/08/18 |
Bibliographical note
Publisher Copyright:© Paweł Gawrychowski, Gad M. Landau, and Tatiana Starikovskaya.
Keywords
- Compact data structures
- Dictionary look-up
- Hamming distance
ASJC Scopus subject areas
- Software