Fast and scalable protein stability prediction improvement without additional training or data

Summary

Deep-learning protein sequence models have shown outstanding performance at de novo protein design and variant effect prediction.

By introducing a second term derived from the models themselves, we substantially improved performance without further training or additional experimental data. This term aligns outputs for the task of stability prediction.

On a task to predict variants that increase protein stability, the absolute success probabilities of ProteinMPNN and ESMif are improved by 11% and 5%, respectively.

Key findings

We demonstrate that inverse folding models can be significantly improved at protein stability prediction without additional training or data.

Accuracy of predictions for various models and publicly available protein stability

The source of improvement

The improvement in accuracy is achieved through a simple additional term derived from the model itself, where only backbone atoms of the single residue being predicted are given, without sequence or structural context, analogously to standard procedures in classical free energy calculations.

Predicting thousands of residues per second

ProteinMPNN is modified to use full sequence context, and we introduce a novel tied decoding scheme to improve computational efficiency and enable saturation mutagenesis studies at scale.

Citing this work

Improving Inverse Folding models at Protein Stability Prediction without additional Training or Data.

Oliver Dutton, Sandro Bottaro, Michele Invernizzi, Istvan Redl, Albert Chung, Carlo Fisicaro, Fabio Airoldi, Stefano Ruschetta, Louie Henderson, Benjamin M.J. Owens, Patrik Foerch, and Kamil Tamiola.

bioRxiv 2024.06.15.599145; https://doi.org/10.1101/2024.06.15.599145

Source code

https://github.com/PeptoneLtd/proteinmpnn_ddg

Interactive Code on Google Colab

https://colab.research.google.com/github/PeptoneLtd/proteinmpnn_ddg/blob/main/ProteinMPNN_ddG.ipynb

About Peptone

Peptone is a translational biophysics company focused on the discovery of novel therapeutics against diseases driven by intrinsically disordered proteins. IDPs are proteins without a fixed structure that play a significant role in health and disease.