タンパク質工学のためのシンプルでロバストな実験プロセス(A simple and robust experimental process for protein engineering)

2024-03-20

2024-03-13 ミシガン大学

ミシガン大学の研究者による新しい研究によると、シンプルでコスト効果の高い実験と機械学習モデルを使用したタンパク質工学法が、特定の目的に対して効果的なタンパク質を予測できることが分かりました。この手法は、産業ツールから治療薬までの様々な用途にタンパク質やペプチドを組み立てる可能性があります。例えば、この技術は、現行の医薬品では実現できない疾患治療のための安定したペプチドの開発を加速するのに役立ちます。

<関連情報>

細胞選別データから連続的なタンパク質特性を予測し、未知の配列空間をマッピングする機械学習 Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space

Marshall Case, Matthew Smith, Jordan Vinh, and Greg Thurber
Proceedings of the National Academy of Sciences Published:March 7, 2024
DOI:https://doi.org/10.1073/pnas.2311726121

Significance

We demonstrate that, surprisingly, information obtained from simple sorting experiments coupled with linear machine learning models consistently predicts continuous protein properties across multiple protein engineering tasks. The ability to readily predict protein sequence for one or more fitness objectives (affinity, fluorescence, and specificity) can reduce the cost and increase the scale of experimental measurements of protein fitness while retaining the accuracy of more complex experimental methods. This manuscript further provides a powerful protein optimization method to harness information from commonly obtained cell sorting data to design high fitness agents that lie beyond experimentally measured sequence space.

Abstract

Proteins are a diverse class of biomolecules responsible for wide-ranging cellular functions, from catalyzing reactions to recognizing pathogens. The ability to evolve proteins rapidly and inexpensively toward improved properties is a common objective for protein engineers. Powerful high-throughput methods like fluorescent activated cell sorting and next-generation sequencing have dramatically improved directed evolution experiments. However, it is unclear how to best leverage these data to characterize protein fitness landscapes more completely and identify lead candidates. In this work, we develop a simple yet powerful framework to improve protein optimization by predicting continuous protein properties from simple directed evolution experiments using interpretable, linear machine learning models. Importantly, we find that these models, which use data from simple but imprecise experimental estimates of protein fitness, have predictive capabilities that approach more precise but expensive data. Evaluated across five diverse protein engineering tasks, continuous properties are consistently predicted from readily available deep sequencing data, demonstrating that protein fitness space can be reasonably well modeled by linear relationships among sequence mutations. To prospectively test the utility of this approach, we generated a library of stapled peptides and applied the framework to predict affinity and specificity from simple cell sorting data. We then coupled integer linear programming, a method to optimize protein fitness from linear weights, with mutation scores from machine learning to identify variants in unseen sequence space that have improved and co-optimal properties. This approach represents a versatile tool for improved analysis and identification of protein variants across many domains of protein engineering.

月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31