Sequence similarity measures based on bounded hamming distance

Alberto Apostolico, Concettina Guerra, Gad M. Landau, Cinzia Pizzi

Research output: Contribution to journalArticlepeer-review

Abstract

A growing number of measures of sequence similarity are being based on some underlying notion of relative compressibility. Within this paradigm, similar sequences are expected to share a large number of common substrings, or subsequences, or more complex patterns or motifs, and so on. In this paper, measures of sequence similarity are introduced and studied in which patterns in a pair are considered similar if they coincide up to a preset number of mismatches, that is, within a bounded Hamming distance. It is shown here that for some such measures bounds are achievable that are slightly better than O(n2). Preliminary experiments demonstrate the potential applicability to phylogeny and classification of these similarity measures.

Original languageEnglish
Pages (from-to)76-90
Number of pages15
JournalTheoretical Computer Science
Volume638
DOIs
StatePublished - 25 Jul 2016

Bibliographical note

Publisher Copyright:
© 2016 Elsevier B.V.

Keywords

  • Alignment free distances
  • Binary string
  • Longest common substring
  • Mismatches
  • Pattern matching
  • String comparison

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Sequence similarity measures based on bounded hamming distance'. Together they form a unique fingerprint.

Cite this