Large-scale genome clustering across life based on a linguistic approach

Valery Kirzhner, Alexander Bolshoy, Zeev Volkovich, Abraham Korol, Eviatar Nevo

Research output: Contribution to journalArticlepeer-review


With the availability of genome sequences, the possibility of new phylogenetic reconstructions arises in order to reveal genomic relationships among organisms. According to the compositional-spectra (CS) approach proposed in our previous studies, any genomic sequence can be characterized by a distribution of frequencies of imperfect matching of words (oligonucleotides). In the current application of CS-analysis, we attempted to analyze the cluster structure of genomes across life. It appeared that compositional spectra show a clear three-group clustering of the compared prokaryotic and eukaryotic genomes. Unexpectedly, this grouping seriously differs from the classical Universal Tree of Life structure represented by common kingdoms known as Eubacteria, Archaebacteria, and Eukarya. The revealed CS-clustering displays high stability, putatively reflecting its objective nature, and still enigmatic biological significance that may result from convergent evolution driven by ecological selection. We believe that our approach provides a new and wider (compared to traditional methods) perspective of extracting genomic information of high evolutionary relevance.

Original languageEnglish
Pages (from-to)208-222
Number of pages15
Issue number3
StatePublished - Sep 2005


  • Clustering
  • Comparative genomics
  • Ecological convergence
  • Occurrences
  • Oligonucleotide
  • Sequence comparisons

ASJC Scopus subject areas

  • Statistics and Probability
  • Modeling and Simulation
  • General Biochemistry, Genetics and Molecular Biology
  • Applied Mathematics


Dive into the research topics of 'Large-scale genome clustering across life based on a linguistic approach'. Together they form a unique fingerprint.

Cite this