String cadences

Amihood Amir, Alberto Apostolico, Travis Gagie, Gad M. Landau

Research output: Contribution to journalArticlepeer-review

Abstract

Cadences are syntactic regularities in strings, of the family of periods, squares, and repetitions. We say a string has a cadence if a certain character is repeated at regular intervals, possibly with intervening occurrences of that character. We call the cadence anchored if the first interval must be the same length as the others. Although cadences′ combinatorial properties have been explored, little work was done regarding the efficiency of their discovery. Recently, implementations involving cadences appeared in works on phylogenetic reconstruction, periodic subgraph mining, and monitoring events in computer networks. In this paper we begin a systematic study of the efficiency of finding cadences. We first give some basic definitions; we then give a sub-quadratic algorithm for determining whether a string has any cadence consisting of at least three occurrences of a character, and a nearly linear algorithm for finding all anchored cadences; finally, we propose a data structure that captures many features of cadences and allows for the efficient detection of many types of cadences. In particular, all sub-cadences can be detected and reported in time proportional to the sum of their lengths.

Original languageEnglish
Pages (from-to)4-8
Number of pages5
JournalTheoretical Computer Science
Volume698
DOIs
StatePublished - 25 Oct 2017

Bibliographical note

Publisher Copyright:
© 2017 Elsevier B.V.

Keywords

  • Cadences
  • Pattern mining
  • String algorithms
  • String regularities

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'String cadences'. Together they form a unique fingerprint.

Cite this