収斂進化の分子メカニズムを解明するディープラーニングツールACEPを開発(Researchers Develop Deep Learning Tool ACEP to Unravel Molecular Mechanisms of Convergent Evolution)

ad

2025-09-26 中国科学院(CAS)

Web要約 の発言:
中国科学院動物研究所の鄒正廷教授らは、収斂進化の分子メカニズム解明を目的に、深層学習を用いた新手法「ACEP(Adaptive Convergence by Embedding of Proteins)」を開発した。収斂進化とは異なる種が独立に類似の形質を進化させる現象で、従来手法はタンパク質中の個々のアミノ酸変化に注目してきたが、高次構造など機能的に重要な要素を十分に扱えていなかった。研究チームは事前学習済みのタンパク質言語モデル(PLM)を利用し、タンパク質の高次特徴を捉える埋め込みを生成、これを基にゲノム全体で適応的収斂を検出するACEPを構築した。コウモリとハクジラのエコーロケーション関連遺伝子を解析した結果、既知に加え新規候補遺伝子も同定し、タンパク質の電荷密度など物理化学的特性との関連が示唆された。本研究は、機能的収斂進化の基盤となる分子メカニズム解明に貢献する。

<関連情報>

言語モデルは、タンパク質機能の適応的収束進化の複雑な配列基盤を明らかにする Language models reveal a complex sequence basis for adaptive convergent evolution of protein functions

Zhenqiu Cao, Hongjiu Zhang, and Zhengting Zou
Proceedings of the National Academy of Sciences  September 23, 2025
DOI:https://doi.org/10.1073/pnas.2418254122

Significance

In biology, repeated emergence of the same functional trait in evolution is important as it provides opportunity to decode the relations between genome or protein sequences to specific functions. Such functional convergence has been largely linked to sequence convergence at the level of single sites, because conventional methods cannot measure similarity of high-order features of sequences. This study reveals that the recent protein language models can extract embeddings from protein sequences reflecting high-order features, and develops statistical tests to evaluate the adaptive convergence of such features. The findings emphasize an underrated sequence basis for functional trait convergence in evolution, provide corresponding detection framework, and demonstrate potential power of deep learning in investigating the complex sequence–function mapping in evolutionary biology.

Abstract

Convergent evolution, or convergence, refers to repeated, independent emergences of the same trait in two or more lineages of species during evolution, often indicating functional adaptation to specific environmental factors. Many computational methods have been proposed to investigate the genetic basis for organismal functional convergence, as an important way to decode the complex sequence–function map of proteins. These methods mostly focus on the convergence of amino acid states at the level of individual sites in functionally related proteins. However, even without site-level sequence similarity, protein function similarity may also stem from convergence of high-order protein features, which cannot be captured by the conventional methods. To fill this gap, we first derived numerical embeddings from protein sequences by pretrained protein language models (PLM). In four previously reported cases, we found that functionally convergent proteins have similar embeddings despite no site-level convergence, indicating that PLM embeddings can reflect convergence of high-order protein features. We then designed a pipeline to detect Adaptive Convergence by Embedding of Protein (ACEP). ACEP tests were significant on known and additional candidate genes with putative adaptive convergence like echolocation and crassulacean acid metabolism. Genome-wide application showed that the ACEP framework can effectively enrich such candidates. Relations between convergences of PLM embeddings and specific protein physicochemical features were further examined. In conclusion, PLM embeddings can indicate adaptive convergence of high-order protein features beyond site identities, demonstrating the power of deep learning tools for investigating the complex mapping between molecular sequences and functions.

生物工学一般
ad
ad
Follow
ad
タイトルとURLをコピーしました