## Abstract

We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of d strings of length m and an integer k, we must preprocess it into a data structure to answer the following queries: Given a query string Q of length m, find all strings in the dictionary that are at Hamming distance at most k from Q. Chan and Lewenstein (CPM 2015) showed a data structure for k = 1 with optimal query time O(m/w + occ), where w is the size of a machine word and occ is the size of the output. The data structure occupies O(wd log^{1+ε} d) extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values k. Namely, we give a data structure that has O(m/w + log^{k} d + occ) query time and uses O(wd log^{k} d) extra bits of space.

Original language | English |
---|---|

Title of host publication | 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 |

Editors | Igor Potapov, James Worrell, Paul Spirakis |

Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |

Pages | 66:1–66:15 |

ISBN (Print) | 9783959770866 |

DOIs | |

State | Published - 1 Aug 2018 |

Event | 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 - Liverpool, United Kingdom Duration: 27 Aug 2018 → 31 Aug 2018 |

### Publication series

Name | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|

Volume | 117 |

ISSN (Print) | 1868-8969 |

### Conference

Conference | 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018 |
---|---|

Country/Territory | United Kingdom |

City | Liverpool |

Period | 27/08/18 → 31/08/18 |

### Bibliographical note

Publisher Copyright:© Paweł Gawrychowski, Gad M. Landau, and Tatiana Starikovskaya.

## Keywords

- Compact data structures
- Dictionary look-up
- Hamming distance

## ASJC Scopus subject areas

- Software