A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment

Shani Tzahor, Dikla Man-Aharonovich, Benjamin C. Kirkup, Tali Yogev, Ilana Berman-Frank, Martin F. Polz, Oded Béjà, Yael Mandel-Gutfreund

Research output: Contribution to journalArticlepeer-review


Background: Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD) were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. Results: To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM) with a codon usage position specific scoring matrix (cuPSSM). Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. Conclusion: The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.

Original languageEnglish
Article number229
JournalBMC Genomics
StatePublished - 16 May 2009
Externally publishedYes

Bibliographical note

Funding Information:
We would like to thank the sequencing team in the MPI for Molecular Genetics in Berlin for their technical support. We are also grateful to the captain and engineer of the R/V Mediterranean Explorer and Naama Dekel for their technical help during the cruises, and the EcoOcean Marine Research and Education Organization. We are grateful to Feng Chen and Kui Wang for allowing us to use their psbA primers prior to publication, and to Debbie Lindell for her valuable input. We thank Itai Sharon for great help with data extraction and valuable discussions throughout the work and for his input to the manuscript. This research is part of the requirements for a PhD thesis for T.Y. at Bar-Ilan University. This work was partially supported by a grant from the Israeli Ministry of Science and Technology, an EMBO YIP award (O.B.), a Marine Genomics Network of Excellence EU grant (O.B.), ISF grant (I.B-F and O.B.), and the United States National Science Foundation, Biological Oceanography and Moore Foundation (M.F.P.). S.T. was supported by the Interdisciplinary Biotechnology Program at the Technion.

ASJC Scopus subject areas

  • Genetics
  • Biotechnology

Cite this