Large-scale design and refinement of stable proteins using sequence-only models

Jedediah M. Singer,S. Novotney,D. Strickland,Hugh K. Haddox,N. Leiby,G.J. Rocklin,Cameron M. Chow,A. Roy,Asim K. Bera,Francis C. Motta,Longxing Cao,Eva-Maria Strauch,T.M. Chidyausiku,Alex Ford,E Ho,C. O. Mackenzie,Hamed Eramian,Frank DiMaio,G. Grigoryan,M. Vaughn,Lance Stewart,David Baker,Eric Klavins

Large-scale design and refinement of stable proteins using sequence-only models

2021

Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we report a neural network model that predicts protein stability based only on sequences of amino acids, and demonstrate its performance by evaluating the stability of almost 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We also report a second neural network model that is able to generate novel stable proteins. Finally, we show that the predictive model can be used to substantially increase the stability of both expert-designed and model-generated proteins.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations