Motivation: Most methods that are used to compare protein structures use three-dimensional (3D) structural information. At the same time, it has been shown that a 1D string representation of local protein structure retains a degree of structural information. This type of representation can be a powerful tool for protein structure comparison and classification, given the arsenal of sequence comparison tools developed by computational biology. However, in order to do so, there is a need to first understand how much information is contained in various possible 1D representations of protein structure. Results: Here we describe the use of a particular structure fragment library, denoted here as KL-strings, for the 1D representation of protein structure. Using KL-strings, we develop an infrastructure for comparing protein structures with a 1D representation. This study focuses on the added value gained from such a description. We show the new local structure language adds resolution to the traditional three-state (helix, strand and coil) secondary structure description, and provides a high degree of accuracy in recognizing structural similarities when used with a pairwise alignment benchmark. The results of this study have immediate applications towards fast structure recognition, and for fold prediction and classification.
Bibliographical noteFunding Information:
We are grateful to Barry Grant and Łukasz Jaroszewski for their assistance in analyzing the substitution matrices. Thanks to Mickey Kosloff for critical reading and helpful suggestions. We would like to thank the members of the Godzik Lab for stimulating discussions and valuable input. The work of E.S. is supported by the Israel Science Foundation and by the Leo and Julia Forchheimer Center for Molecular Genetics. This study was funded by NIH grant P01-GM62308.
ASJC Scopus subject areas
- Computational Mathematics
- Molecular Biology
- Statistics and Probability
- Computer Science Applications
- Computational Theory and Mathematics