Small libraries of protein fragments model native protein structures accurately

Rachel Kolodny, Patrice Koehl, Leonidas Guibas, Michael Levitt

Research output: Contribution to journalArticlepeer-review


Prediction of protein structure depends on the accuracy and complexity of the models used. Here, we represent the polypeptide chain by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fit to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. We use a novel clustering method to construct libraries that differ in the fragment length (four to seven residues) and number of representative fragments they contain (25-300). Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). We find that the accuracy depends on the complexity and varies from 2.9 Å for a 2.7-state model on the basis of fragments of length 7-0.76 Å for a 15-state model on the basis of fragments of length 5. Our goal is to find representations that are both accurate and economical (low complexity). The models defined here are substantially better in this regard: with ten states per residue we approximate native protein structure to 1 Å compared to over 20 states per residue needed previously. For the same complexity, we find that longer fragments provide better fits. Unfortunately, libraries of longer fragments must be much larger (for ten states per residue, a seven-residue library is 100 times larger than a five-residue library). As the number of known protein native structures increases, it will be possible to construct larger libraries to better exploit this correlation between neighboring residues. Our fragment libraries, which offer a wide range of optimal fragments suited to different accuracies of fit, may prove to be useful for generating better decoy sets for ab initio protein folding and for generating accurate loop conformations in homology modeling.

Original languageEnglish
Pages (from-to)297-307
Number of pages11
JournalJournal of Molecular Biology
Issue number2
StatePublished - 2002
Externally publishedYes


  • Discrete models
  • Protein representations

ASJC Scopus subject areas

  • Molecular Biology
  • Structural Biology


Dive into the research topics of 'Small libraries of protein fragments model native protein structures accurately'. Together they form a unique fingerprint.

Cite this