AIと臨床検査を組み合わせた遺伝性疾患リスク予測モデルを開発(Mount Sinai Researchers Use AI and Lab Tests to Predict Genetic Disease Risk)

2025-08-29

2025-08-28 マウントサイナイ医療システム(MSHS)

マウント・シナイ医科大学の研究チームは、AIと日常的な臨床検査データを組み合わせ、希少な遺伝子変異が実際に病気につながるかを予測する新手法を開発した。コレステロール値や血算、腎機能などのルーチン検査と電子カルテ情報を解析し、重大疾患10種に関して「ML penetranceスコア」を算出。スコアは0～1でリスクを示し、1に近いほど発症可能性が高い。1,600以上の変異を評価した結果、従来「不明」とされた変異に明確な病態リスクを示す一方で、有病とされた変異が実臨床では影響が少ない例も確認された。医師はこのスコアを活用して、高リスク例には早期検査を、低リスク例には過剰診断を避ける対応が可能になる。

<関連情報>

機械学習に基づく遺伝子変異の浸透率 Machine learning-based penetrance of genetic variants

Iain S. Forrest, Ha My T. Vy, Ghislain Rocheleau, Daniel M. Jordan, […] , and Ron Do
Science Published:28 Aug 2025
DOI:https://doi.org/10.1126/science.adm7066

Editor’s summary

When patients are tested for genetic predisposition to clinical conditions, they sometimes are found to have rare gene variants of unknown significance, leaving clinicians to guess at the implications. In addition, mutations may not directly cause the clinical condition but could increase the risk of developing it. To help interpret such scenarios, Forrest et al. used machine learning to evaluate several large cohorts of patients (see the Perspective by Raiken and Stein). The authors used clinical data to calculate disease scores for 10 conditions and then linked those scores with genetic data, including rare mutations with previously unknown roles. In addition to helping to interpret the contributions of rare mutations for the diseases analyzed here, this study provides a model for examining the contribution of unknown genetic variants to other diseases. —Yevgeniya Nusinovich

Structured Abstract

INTRODUCTION

Accurately estimating the penetrance of genetic variants—the probability that an individual with a variant develops disease—is essential for risk assessment and clinical decision-making. Traditional approaches rely on disease-enriched families or cohorts, which are limited by small sample sizes and ascertainment bias. Moreover, case-versus-control classifications oversimplify diseases that exist on a spectrum, further reducing the accuracy of risk estimates. Machine learning (ML) offers a scalable solution by integrating large-scale electronic health record (EHR) and genetic data to assess penetrance in a data-driven, quantitative, and precise manner.

RATIONALE

Sequencing advances have facilitated the discovery of rare variants in disease-associated genes, many of which are submitted to repositories such as ClinVar and classified based on predicted pathogenicity. However, reliance on laboratory and expert review as well as the absence of large variant datasets measuring real-world disease risk can lead to discordant variant classifications. Emerging population-based analyses have revealed that some variants previously classified as pathogenic (P) exhibit low or variable penetrance, whereas variants of uncertain significance (VUS) remain challenging to interpret clinically. To address these challenges, we developed an ML-based approach to estimate penetrance by leveraging routine clinical laboratory tests, which are widely available in health systems, and intersecting them with genetic data.

RESULTS

We constructed ML models for 10 genetic conditions—arrhythmogenic right ventricular cardiomyopathy, familial breast cancer, familial hypercholesterolemia (FH), hypertrophic cardiomyopathy (HCM), adult hypophosphatasia, long QT syndrome, Lynch syndrome, monogenic diabetes, polycystic kidney disease (PKD), and von Willebrand disease—using 1,347,298 participants with EHR data and applied them to an independent exome-sequenced cohort. Using disease probability scores from these models, we computed ML penetrance for 1648 rare variants across 31 autosomal dominant disease-predisposition genes, spanning P, benign (B), VUS, and previously unknown loss-of-function (LoF) variants. ML penetrance was highest for P and LoF variants, followed by VUS, and lowest for B variants, providing refined quantitative estimates compared with traditional case-versus-control methods.

Notably, ML penetrance correlated with disease-relevant clinical outcomes, such as risk of end-stage renal disease for PKD variants and heart failure for HCM variants. ML penetrance also aligned with experimentally derived measures of variant function, reinforcing its biological relevance. Importantly, ML penetrance aided in the evaluation of VUS and previously unknown LoF variants by delineating clinical trajectories—individuals with highly penetrant variants showed perturbed vital signs, electrocardiogram measures, and disease biomarkers over time. For example, individuals with highly ML penetrant FH variants exhibited 119 mg/dl higher low-density lipoprotein cholesterol and those with highly ML penetrant PKD variants had a 40 ml/min lower glomerular filtration rate.

CONCLUSION

This study presents an ML-based blueprint to systematically evaluate penetrance at scale, integrating genomic and clinical phenotype data. By providing refined, individualized disease risk estimates, ML penetrance has the potential to improve variant assessment, guide clinical decision-making, and enhance precision medicine approaches.

ML-based penetrance estimation of genetic variants.
Routine clinical measurements analyzed by ML generate continuous disease scores, which are integrated with genetic data to assess the penetrance of rare variants in disease-predisposition genes. ML-based penetrance correlates with an individual’s clinical outcomes and biomarkers and refines risk assessment of variants beyond traditional pathogenicity categories, presenting a scalable strategy for precision medicine. [Figure created with BioRender.com]

Abstract

Accurate variant penetrance estimation is crucial for precision medicine. We constructed machine learning (ML) models for 10 diseases using 1,347,298 participants with electronic health records, then applied them to an independent cohort with linked exome data. Resulting probabilities were used to evaluate ML penetrance of 1648 rare variants in 31 autosomal dominant disease-predisposition genes. ML penetrance was variable across variant classes, but highest for pathogenic and loss-of-function variants, and was associated with clinical outcomes and functional data. Compared with conventional case-versus-control approaches, ML penetrance provided refined quantitative estimates and aided the interpretation of variants of uncertain significance and loss-of-function variants by delineating clinical trajectories over time. By leveraging ML and deep phenotyping, we present a scalable approach to accurately quantify disease risk of variants.

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31