Abstract
We study the problem of indexing a text T[1..n] ∈ Σn so that, later, given a query regular expression pattern R of size m = |R|, we can report all the occ substrings T[i..j] of T matching R. The problem is known to be hard for arbitrary patterns R, so in this paper, we consider the following two types of patterns. (1) Character-class Kleene-star patterns of the form P1D∗P2, where P1 and P2 are strings and D = {c1, . . ., ck} ⊂ Σ is a character-class (shorthand for the regular expression (c1|c2|··· |ck)) and (2) String Kleene-star patterns of the form P1P∗P2 where P, P1 and P2 are strings. In case (1), we describe an index of O(nlog1+ϵ n) space (for any constant ϵ > 0) solving queries in time O(m + log n/log log n + occ) on constant-sized alphabets. We also describe a general solution for any alphabet size. This result is conditioned on the existence of an anchor: a character of P1P2 that does not belong to D. We justify this assumption by proving that no efficient indexing solution can exist if an anchor is not present unless the Set Disjointness Conjecture fails. In case (2), we describe an index of size O(n) answering queries in time O(m + (occ + 1) logϵ n) on any alphabet size.
| Original language | English |
|---|---|
| Title of host publication | 36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025 |
| Editors | Paola Bonizzoni, Veli Makinen |
| Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
| ISBN (Electronic) | 9783959773690 |
| DOIs | |
| State | Published - 10 Jun 2025 |
| Event | 36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025 - Milan, Italy Duration: 17 Jun 2025 → 19 Jun 2025 |
Publication series
| Name | Leibniz International Proceedings in Informatics, LIPIcs |
|---|---|
| Volume | 331 |
| ISSN (Print) | 1868-8969 |
Conference
| Conference | 36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025 |
|---|---|
| Country/Territory | Italy |
| City | Milan |
| Period | 17/06/25 → 19/06/25 |
Bibliographical note
Publisher Copyright:© Hideo Bannai, Philip Bille, Inge Li Gørtz, Gad M. Landau, Gonzalo Navarro, Nicola Prezza, Teresa Anna Steiner, and Simon Rumle Tarnow;
Keywords
- Text indexing
- data structures
- regular expressions
ASJC Scopus subject areas
- Software
Fingerprint
Dive into the research topics of 'Text Indexing for Simple Regular Expressions'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver