Using protein fragments for searching and data-mining protein databases

Chen Keasar, Rachel Kolodny

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Proteins are macro-molecules involved in virtually all of life processes. Protein sequence and structure data is accumulated at an ever increasing rate in publicly-available databases. To extract knowledge from these databases, we need efficient and accurate tools; this is a major goal of computational structural biology. The tasks we consider are searching and mining protein data; we rely on protein fragment libraries to build more efficient tools. We describe FragBag - an example of using fragment libraries to improve protein structural search. To search for patterns in structure space, we discuss methods to generate efficient low-dimensional maps. In particular, we use these maps to identify patterns of functional diversity and sequence diversity. Finally, we discuss how to extend these methods to protein sequences. To do this, one needs to predict local structure from sequence; we survey previous work that suggests that this is a very feasible task. Furthermore, we show that such predictions can be used to improve sequence alignments. Namely, protein fragments can be used to leverage protein structural data to improve remote homology detection.

Original languageEnglish
Title of host publicationArtificial Intelligence and Robotics Methods in Computational Biology - Papers from the 2013 AAAI Workshop, Technical Report
PublisherAI Access Foundation
Pages14-19
Number of pages6
ISBN (Print)9781577356172
StatePublished - 2013
Event2013 AAAI Workshop - Bellevue, WA, United States
Duration: 14 Jul 201314 Jul 2013

Publication series

NameAAAI Workshop - Technical Report
VolumeWS-13-06

Conference

Conference2013 AAAI Workshop
Country/TerritoryUnited States
CityBellevue, WA
Period14/07/1314/07/13

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Using protein fragments for searching and data-mining protein databases'. Together they form a unique fingerprint.

Cite this