Fast entropy-bounded string dictionary look-up with mismatches

Paweł Gawrychowski, Gad M. Landau, Tatiana Starikovskaya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of d strings of length m and an integer k, we must preprocess it into a data structure to answer the following queries: Given a query string Q of length m, find all strings in the dictionary that are at Hamming distance at most k from Q. Chan and Lewenstein (CPM 2015) showed a data structure for k = 1 with optimal query time O(m/w + occ), where w is the size of a machine word and occ is the size of the output. The data structure occupies O(wd log1+ε d) extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values k. Namely, we give a data structure that has O(m/w + logk d + occ) query time and uses O(wd logk d) extra bits of space.

Original languageEnglish
Title of host publication43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018
EditorsIgor Potapov, James Worrell, Paul Spirakis
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages66:1–66:15
ISBN (Print)9783959770866
DOIs
StatePublished - 1 Aug 2018
Event43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 - Liverpool, United Kingdom
Duration: 27 Aug 201831 Aug 2018

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume117
ISSN (Print)1868-8969

Conference

Conference43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018
Country/TerritoryUnited Kingdom
CityLiverpool
Period27/08/1831/08/18

Bibliographical note

Publisher Copyright:
© Paweł Gawrychowski, Gad M. Landau, and Tatiana Starikovskaya.

Keywords

  • Compact data structures
  • Dictionary look-up
  • Hamming distance

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Fast entropy-bounded string dictionary look-up with mismatches'. Together they form a unique fingerprint.

Cite this