How Deep Learning Tools Can Help Protein Engineers Find Good Sequences

Research output: Contribution to journalArticlepeer-review


The deep learning revolution introduced a new and efficacious way to address computational challenges in a wide range of fields, relying on large data sets and powerful computational resources. In protein engineering, we consider the challenge of computationally predicting properties of a protein and designing sequences with these properties. Indeed, accurate and fast deep network oracles for different properties of proteins have been developed. These learn to predict a property from an amino acid sequence by training on large sets of proteins that have this property. In particular, deep networks can learn from the set of all known protein sequences to identify ones that are protein-like. A fundamental challenge when engineering sequences that are both protein-like and satisfy a desired property is that these are rare instances within the vast space of all possible ones. When searching for these very rare instances, one would like to use good sampling procedures. Sampling approaches that are decoupled from the prediction of the property or in which the predictor uses only post-sampling to identify good instances are less efficient. The alternative is to use sampling methods that are geared to generate sequences satisfying and/or optimizing the predictor’s desired properties. Deep learning has a class of architectures, denoted as generative models, which offer the capability of sampling from the learned distribution of a predicted property. Here, we review the use of deep learning tools to find good sequences for protein engineering, including developing oracles/predictors of a property of the proteins and methods that sample from a distribution of protein-like sequences to optimize the desired property.

Original languageEnglish
Pages (from-to)6440-6450
Number of pages11
JournalJournal of Physical Chemistry B
Issue number24
StatePublished - 24 Jun 2021

Bibliographical note

Publisher Copyright:
© 2021 The Authors. Published by American Chemical Society

ASJC Scopus subject areas

  • Materials Chemistry
  • Surfaces, Coatings and Films
  • Physical and Theoretical Chemistry


Dive into the research topics of 'How Deep Learning Tools Can Help Protein Engineers Find Good Sequences'. Together they form a unique fingerprint.

Cite this