Efficient text fingerprinting via parikh mapping

Amihood Amir, Alberto Apostolico, Gad M. Landau, Giorgio Satta

Research output: Contribution to journalArticlepeer-review


We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S' is a substring of S, then the fingerprint of S' is the subset φ of Σ of precisely the symbols appearing in S'. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n |Σ| log n log |Σ|) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ ⊆ Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(| Σ| log n).

Original languageEnglish
Pages (from-to)409-421
Number of pages13
JournalJournal of Discrete Algorithms
Issue number5-6
StatePublished - Oct 2003

Bibliographical note

Funding Information:
Giorgio Satta's work was supported in part by MURST under project PRIN: BioInformatica e Ricerca Genomica and by University of Padova, under project Sviluppo di Sistemi ad Addestramento Automatico per l'Analisi del Linguaggio Naturale.

Funding Information:
Amihood Amir was partially supported by NSF grant CCR-01-04494, BSF grant 96-00509, and an Israel–Italy exchange scientist grant.

Funding Information:
Alberto Apostolico's work was supported in part by NSF Grant CCR-9700276, by MURST under project PRIN: BioInformatica e Ricerca Genomica, by the University of Padova under project Development of Novel Pattern Discovery Algorithms and Software, and by an Israel–Italy exchange scientist grant.

Funding Information:
This research was performed during exchange visits conducted, respectively, by the first and third authors at the University of Padova, and by the second author at the Universities of Bar-Ilan and Haifa, as part of an Israel–Italy exchange scientist grant jointly funded by the Israel Ministry of Science and the National Research Council of Italy.

Funding Information:
Gad Landau was partially supported by NSF grants CCR-9610238, and CCR-0104307, by NATO Science Programme grant PST.CLG.977017, by the Israel Science Foundation grants 173/98 and 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award, and an Israel–Italy exchange scientist grant.


  • Combinatorial algorithms on words
  • Design and analysis of algorithms

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Discrete Mathematics and Combinatorics
  • Computational Theory and Mathematics


Dive into the research topics of 'Efficient text fingerprinting via parikh mapping'. Together they form a unique fingerprint.

Cite this