かつてブラックボックスがあった場所、NISTの新しいLANTERNが照らし出すもの(Where Once Were Black Boxes,NIST’s New LANTERN Illuminates)

2022-06-23

バイオエンジニアのための新しい人工知能ツールは、予測性と説明性を両立させることができるという、初めての試みです。 In a first, a new artificial intelligence tool for bioengineers can be both predictive and explainable.
1. 遺伝子型-表現型ランドスケープの解釈可能なモデリングと最新の予測力 Interpretable modeling of genotype–phenotype landscapes with state-of-the-art predictive power
  1. Abstract

バイオエンジニアのための新しい人工知能ツールは、予測性と説明性を両立させることができるという、初めての試みです。 In a first, a new artificial intelligence tool for bioengineers can be both predictive and explainable.

2022-06-22 アメリカ国立標準技術研究所(NIST)

米国国立標準技術研究所(NIST)の研究者たちが、タンパク質の機能を予測するための新しい統計ツールを開発しました。このツールは、タンパク質を実用的に有用な方法で改変するという難しい仕事に役立つだけでなく、完全に解釈可能な方法によって機能するため、これまでタンパク質工学を支援してきた従来の人工知能(AI)よりも優れています。
「LANTERN」と呼ばれるこの新しいツールは、バイオ燃料の生産から農作物の改良、新しい病気治療法の開発まで、幅広い業務に役立つ可能性があります。タンパク質は、生物学の構成要素として、これらの作業すべてにおいて重要な要素です。しかし、タンパク質の設計図となるDNA鎖に変更を加えることは比較的容易であるが、DNAのはしごである特定の塩基対が、目的の効果を生み出す鍵となることを明らかにすることは、依然として困難である。この鍵の発見には、ディープニューラルネットワーク(DNN)と呼ばれるAIが使われてきた。DNNは有効ではあるが、人間が理解するには不透明であることが知られている。
LANTERNは、3つの異なるタンパク質に有用な違いを生み出すのに必要な遺伝子編集を予測する能力を示しています。1つは、COVID-19の原因となるSARS-CoV-2ウイルスの表面にあるスパイク状のタンパク質です。DNAの変化によりこのスパイクタンパク質がどのように変化するかを理解すれば、疫学者がパンデミックの将来を予測するのに役立つと考えられます。残りの2つは、大腸菌のLacIタンパク質と、生物学の実験で目印として使われる緑色蛍光タンパク質(GFP)で、研究室の主力としてよく知られている。

<関連情報>

遺伝子型-表現型ランドスケープの解釈可能なモデリングと最新の予測力 Interpretable modeling of genotype–phenotype landscapes with state-of-the-art predictive power

Peter D. Tonner, Abe Pressman, and David Ross
Proceedings of the National Academy of Sciences Published:June 21, 2022
DOI: 10.1073/pnas.2114021119

Abstract

Large-scale measurements linking genetic background to biological function have driven a need for models that can incorporate these data for reliable predictions and insight into the underlying biophysical system. Recent modeling efforts, however, prioritize predictive accuracy at the expense of model interpretability. Here, we present LANTERN (landscape interpretable nonparametric model, https://github.com/usnistgov/lantern), a hierarchical Bayesian model that distills genotype–phenotype landscape (GPL) measurements into a low-dimensional feature space that represents the fundamental biological mechanisms of the system while also enabling straightforward, explainable predictions. Across a benchmark of large-scale datasets, LANTERN equals or outperforms all alternative approaches, including deep neural networks. LANTERN furthermore extracts useful insights of the landscape, including its inherent dimensionality, a latent space of additive mutational effects, and metrics of landscape structure. LANTERN facilitates straightforward discovery of fundamental mechanisms in GPLs, while also reliably extrapolating to unexplored regions of genotypic space.

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31