Cadences are syntactic regularities in strings, of the family of periods, squares, and repetitions. We say a string has a cadence if a certain character is repeated at regular intervals, possibly with intervening occurrences of that character. We call the cadence anchored if the first interval must be the same length as the others. Although cadences′ combinatorial properties have been explored, little work was done regarding the efficiency of their discovery. Recently, implementations involving cadences appeared in works on phylogenetic reconstruction, periodic subgraph mining, and monitoring events in computer networks. In this paper we begin a systematic study of the efficiency of finding cadences. We first give some basic definitions; we then give a sub-quadratic algorithm for determining whether a string has any cadence consisting of at least three occurrences of a character, and a nearly linear algorithm for finding all anchored cadences; finally, we propose a data structure that captures many features of cadences and allows for the efficient detection of many types of cadences. In particular, all sub-cadences can be detected and reported in time proportional to the sum of their lengths.
|Number of pages||5|
|Journal||Theoretical Computer Science|
|State||Published - 25 Oct 2017|
Bibliographical noteFunding Information:
Many thanks to the organizers and other participants of the AxA workshop, to Margaret Gagie for proofreading, and to the anonymous reviewers for their comments. The first and fourth authors were partly supported by ISF grant 571/14 and BSF grant 2014028 . The second author was partly supported by Academy of Finland grant 268324 . This work is dedicated to the memory of our good friend and mentor Alberto.
© 2017 Elsevier B.V.
- Pattern mining
- String algorithms
- String regularities
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science (all)