アルゴンヌのチームがAI駆動型タンパク質設計の新境地を開拓(Argonne team breaks new ground in AI-driven protein design)

2024-11-07

2024-11-06 アルゴンヌ国立研究所(ANL)

Using the MProt-DPO framework, scientists created synthetic versions of malate dehydrogenase that preserve the protein’s critical structure and key binding areas. (Image by Arvind Ramanathan/Argonne National Laboratory.)

アルゴンヌ国立研究所のチームは、AIと超高性能コンピューターを活用して、新しいタンパク質設計を迅速化する革新的なMProt-DPOフレームワークを開発しました。このフレームワークは、タンパク質の配列データ、実験結果、シミュレーション、テキスト記述を統合し、薬剤や環境保全に役立つタンパク質を効率的に探索します。超大規模データを処理することで、AIは膨大な設計可能性を分析し、最適なタンパク質を提案可能です。

<関連情報>

MProt-DPO:直接選好最適化により、マルチモーダルなタンパク質設計ワークフローのエクサフロップスの壁を破る MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization

Gautham Dharuman,Kyle Hippe,,Alexander Brace,Sam Foreman,Varuni Sastry,Huihuo Zheng,Logan Ward,Archit Vasan,Bharat Kale,Carla Mann,Heng Ma,Yun-Hsuan Cheng,Shengchao Liu,Chaowei Xiao,Koichi Yamada,Intel Corporation,Murali Emani,Thomas Gibbs,Ian Foster,Rick Stevens,Anima Anandkumar,Venkatram Vishwanath,Arvind Ramanathan,Yuliana Zamora,Servesh Muralidharan,Michael E. Papka

SessionACM Gordon Bell Finalists Presentations 2

DescriptionWe present a scalable, end-to-end workflow for protein design. By augmenting protein sequences with natural language descriptions of biochemical properties, we train generative models to preferentially align with protein fitness landscapes. Through complex experimental- and simulation-based observations, we integrate these measures as preferred parameters for generating new protein variants and demonstrate our workflow on five diverse supercomputers. We achieve >1 ExaFLOPS sustained performance in mixed precision on each supercomputer and a maximum sustained performance of 4.11 ExaFLOPS and peak performance of 5.57 ExaFLOPS. We establish the performance of our model on two tasks: (1) across a predetermined benchmark dataset of deep mutational scanning experiments to optimize the fitness-determining mutations in the yeast protein HIS7, and (2) in optimizing the design of the enzyme malate dehydrogenase to achieve lower activation barriers (and therefore increased catalytic rates) using simulation data. Our implementation thus sets high watermarks for multimodal protein design workflows.

テキスト誘導型タンパク質設計フレームワーク A Text-guided Protein Design Framework

Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar
ArXix last revised 12 Aug 2024 (this version, v3
DOI:https://doi.org/10.48550/arXiv.2302.04611

Abstract

Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins’ high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90\% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30