The method of N-grams in large-scale clustering of DNA texts

Z. Volkovich, V. Kirzhner, A. Bolshoy, E. Nevo, A. Korol

Research output: Contribution to journalArticlepeer-review


This paper is devoted to the techniques of clustering of texts based on the comparison of vocabularies of N-grams. In contrast to the regular N-grams approach, the proposed N-grams method is based on calculation of imperfect occurrences of N-grams in a text up to a number of mismatched strings. We demonstrated that such an approach essentially improves the resolving capacity of the N-grams method for DNA texts. Additionally, we discuss a mutual usage scheme of different clustering technique types to verify the partition quality.

Original languageEnglish
Pages (from-to)1902-1912
Number of pages11
JournalPattern Recognition
Issue number11
StatePublished - Nov 2005


  • Clustering
  • Compositional spectra
  • Genome comparisons
  • N-grams
  • Strings mismatching

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence


Dive into the research topics of 'The method of N-grams in large-scale clustering of DNA texts'. Together they form a unique fingerprint.

Cite this