AIがタンパク質の相互作用相手をマッチング(AI matches protein interaction partners)

2024-07-052024-10-07

2024-07-04 スイス連邦工科大学ローザンヌ校(EPFL)

タンパク質は生命の基本構成要素であり、その相互作用を理解することは細胞機能の解明や医薬品開発に重要です。しかし、タンパク質の結合予測は多様性と構造の複雑さから困難でした。EPFLのAnne-Florence Bitbolの研究チームは、DiffPALMというAIベースのアプローチを開発し、この問題を解決しました。この手法はタンパク質言語モデルを使用して、タンパク質相互作用を高精度で予測します。従来の方法に比べ、少ないデータセットでも動作し、珍しいタンパク質の予測にも対応可能です。DiffPALMは医療研究や薬剤開発に応用され、疾患メカニズムの理解や標的療法の開発に貢献する可能性があります。研究チームはこのツールを公開し、計算生物学のさらなる進展を期待しています。

<関連情報>

マスク言語モデリングを用いた相互作用するタンパク質配列のペアリング Pairing interacting protein sequences using masked language modeling

Umberto Lupo, Damiano Sgarbossa, and Anne-Florence Bitbol
Proceedings of the National Academy of Sciences Published:June 24, 2024
DOI:https://doi.org/10.1073/pnas.2311887121

Significance

Deep learning has brought major advances to the analysis of biological sequences. Self-supervised models, based on approaches from natural language processing and trained on large ensembles of protein sequences, efficiently learn statistical dependence in this data. This includes coevolution patterns between structurally or functionally coupled amino acids, which allows them to capture structural contacts. We propose a method to pair interacting protein sequences which leverages the power of a protein language model trained on multiple sequence alignments. Our method performs well for small datasets that are challenging for existing methods. It can improve structure prediction of protein complexes by supervised methods, which remains more challenging than that of single-chain proteins.

Abstract

Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30