Abstract
Cadences are syntactic regularities in strings, of the family of periods, squares, and repetitions. We say a string has a cadence if a certain character is repeated at regular intervals, possibly with intervening occurrences of that character. We call the cadence anchored if the first interval must be the same length as the others. Although cadences′ combinatorial properties have been explored, little work was done regarding the efficiency of their discovery. Recently, implementations involving cadences appeared in works on phylogenetic reconstruction, periodic subgraph mining, and monitoring events in computer networks. In this paper we begin a systematic study of the efficiency of finding cadences. We first give some basic definitions; we then give a sub-quadratic algorithm for determining whether a string has any cadence consisting of at least three occurrences of a character, and a nearly linear algorithm for finding all anchored cadences; finally, we propose a data structure that captures many features of cadences and allows for the efficient detection of many types of cadences. In particular, all sub-cadences can be detected and reported in time proportional to the sum of their lengths.
Original language | English |
---|---|
Pages (from-to) | 4-8 |
Number of pages | 5 |
Journal | Theoretical Computer Science |
Volume | 698 |
DOIs | |
State | Published - 25 Oct 2017 |
Bibliographical note
Publisher Copyright:© 2017 Elsevier B.V.
Keywords
- Cadences
- Pattern mining
- String algorithms
- String regularities
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science