酵母由来タンパク質医薬品開発を効率化するAIモデル開発(New AI Model Could Cut Costs of Developing Protein Drugs)

ad

2026-02-16 マサチューセッツ工科大学(MIT)

Massachusetts Institute of Technologyの研究チームは、タンパク質医薬の開発コスト削減につながる新たなAIモデルを開発した。モデルはタンパク質配列と構造・機能の関係を学習し、安定性や有効性の高い候補分子を効率的に予測。従来は多数の実験を要した最適化工程を計算上で加速できる点が特長で、開発期間と費用の大幅な削減が期待される。創薬初期段階での候補選別精度向上により、バイオ医薬品の設計プロセスを変革する可能性がある。

酵母由来タンパク質医薬品開発を効率化するAIモデル開発(New AI Model Could Cut Costs of Developing Protein Drugs)
Using a large language model (LLM), MIT researchers analyzed the genetic code of industrial yeast Komagataella phaffii — specifically, the codons that it uses. The new MIT model could then predict which codons would work best for manufacturing a given protein.Credits:Image: MIT News; iStock

<関連情報>

Pichia-CLM: Komagataella phaffiiの言語モデルに基づくコドン最適化パイプライン Pichia-CLM: A language model–based codon optimization pipeline for Komagataella phaffii

Harini Narayanan and J. Christopher Love
Proceedings of the National Academy of Sciences  Published:February 17, 2026

Significance

This paper presents Pichia–Codon language model (Pichia-CLM), a deep learning–based language model for codon optimization to enhance recombinant protein production in the industrially relevant host, Komagataella phaffii. Unlike conventional approaches that rely on codon usage bias metrics (CUB)—often providing a global score and ignoring sequence context—Pichia-CLM leverages the host genome to unbiasedly learn the amino acid-to-codon mapping. Prior deep learning models have attempted codon optimization but typically evaluated performance using CUB metrics with limited experimental validation. In contrast, we experimentally validate Pichia-CLM across six diverse protein classes of varying complexity and consistently observe superior expression titers compared to four commercial codon optimization tools. Our findings also reveal the limitations of using CUB metrics, showing a poor correlation between these scores and observed protein yields.

Abstract

The preference in synonymous codon usage—the so-called codon usage bias (CUB)—is governed by several factors such as the host organism, context and function of the gene, and the position of the codon within the gene itself. We demonstrated that this mapping can be learned from the host’s genome using language models and subsequently applied for codon optimization of heterologous proteins expressed by the host. This pipeline called Pichia–Codon language model (Pichia-CLM) was applied to the industrial host organism, Komagataella phaffii. With this approach, production of heterologous proteins was enhanced up to threefold compared to their native sequences. Furthermore, Pichia-CLM consistently yielded constructs with enhanced productivity for proteins of varied complexity, compared to commercially available tools. Finally, we showed that Pichia-CLM generates sequences resembling the properties of codon usage found in the host’s intrinsic host cell proteins and learned features such as avoiding negative cis-regulatory and repeat elements based on patterns in the genome data. These results show the potential of language models to unbiasedly learn patterns and design robust sequences for improved protein production.

有機化学・薬学
ad
ad
Follow
ad
タイトルとURLをコピーしました