The homology kernel: A biologically motivated sequence embedding into euclidean space

Eleazar Eskin, Sagi Snir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Part of the challenge of modeling protein sequences is their discrete nature. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space and then apply these techniques to the embedded points. In this paper, we introduce a biologically motivated sequence embedding, the homology kernel, which takes into account intuitions from local alignment, sequence homology, and predicted secondary structure. We apply the homology kernel in several ways. We demonstrate how the homology kernel can be used for protein family classification and outperforms state-of-the-art methods for remote homology detection. We show that the homology kernel can be used for secondary structure prediction and is competitive with popular secondary structure prediction methods. Finally, we show how the homology kernel can be used to incorporate information from homologous sequences in local sequence alignment.

Original languageEnglish
Title of host publicationProceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05
PublisherIEEE Computer Society
ISBN (Print)0780393872, 9780780393875
DOIs
StatePublished - 2005
Externally publishedYes
Event2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05 - La Jolla, CA, United States
Duration: 14 Nov 200515 Nov 2005

Publication series

NameProceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05
Volume2005

Conference

Conference2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05
Country/TerritoryUnited States
CityLa Jolla, CA
Period14/11/0515/11/05

ASJC Scopus subject areas

  • Engineering (all)

Fingerprint

Dive into the research topics of 'The homology kernel: A biologically motivated sequence embedding into euclidean space'. Together they form a unique fingerprint.

Cite this