新しいAIモデルにより、疾患に関連するゲノミクスの予測力が向上(New AI model improves prediction power for genomics related to disease)

ad

2024-11-08 ロスアラモス国立研究所(LANL)

ロスアラモス国立研究所の研究者は、AIを活用してゲノムデータから新たな知見を得る手法を開発しました。この手法は、膨大な遺伝情報を解析し、疾患の原因となる遺伝的変異を特定するのに役立ちます。特に、希少疾患の診断や治療法の開発に貢献することが期待されています。さらに、この技術は、個別化医療の推進や新薬の開発プロセスの効率化にも寄与する可能性があります。

<関連情報>

DNA呼吸とディープラーニング基礎モデルの統合により、ヒト転写因子のゲノムワイド結合予測が進歩 DNA breathing integration with deep learning foundational model advances genome-wide binding prediction of human transcription factors

Anowarul Kabir, Manish Bhattarai, Selma Peterson, Yonatan Najman-Licht, Kim Ø Rasmussen, Amarda Shehu, Alan R Bishop, Boian Alexandrov, Anny Usheva
Nucleic Acids Research  Published:13 September 2024
DOI:https://doi.org/10.1093/nar/gkae783

新しいAIモデルにより、疾患に関連するゲノミクスの予測力が向上(New AI model improves prediction power for genomics related to disease)
Graphical Abstract

Abstract

It was previously shown that DNA breathing, thermodynamic stability, as well as transcriptional activity and transcription factor (TF) bindings are functionally correlated. To ascertain the precise relationship between TF binding and DNA breathing, we developed the multi-modal deep learning model EPBDxDNABERT-2, which is based on the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA dynamics model. To train our EPBDxDNABERT-2, we used chromatin immunoprecipitation sequencing (ChIP-Seq) data comprising 690 ChIP-seq experimental results encompassing 161 distinct TFs and 91 human cell types. EPBDxDNABERT-2 significantly improves the prediction of over 660 TF-DNA, with an increase in the area under the receiver operating characteristic (AUROC) metric of up to 9.6% when compared to the baseline model that does not leverage DNA biophysical properties. We expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (HT-SELEX) dataset of 215 TFs from 27 families, comparing EPBD with established frameworks. The integration of the DNA breathing features with DNABERT-2 foundational model, greatly enhanced TF-binding predictions. Notably, EPBDxDNABERT-2, trained on a large-scale multi-species genomes, with a cross-attention mechanism, improved predictive power shedding light on the mechanisms underlying disease-related non-coding variants discovered in genome-wide association studies.

細胞遺伝子工学
ad
ad
Follow
ad
タイトルとURLをコピーしました