2025-08-28 マウントサイナイ医療システム(MSHS)
<関連情報>
- https://www.mountsinai.org/about/newsroom/2025/mount-sinai-researchers-use-ai-and-lab-tests-to-predict-genetic-disease-risk
- https://www.science.org/doi/10.1126/science.adm7066
機械学習に基づく遺伝子変異の浸透率 Machine learning-based penetrance of genetic variants
Iain S. Forrest, Ha My T. Vy, Ghislain Rocheleau, Daniel M. Jordan, […] , and Ron Do
Science Published:28 Aug 2025
DOI:https://doi.org/10.1126/science.adm7066
Editor’s summary
When patients are tested for genetic predisposition to clinical conditions, they sometimes are found to have rare gene variants of unknown significance, leaving clinicians to guess at the implications. In addition, mutations may not directly cause the clinical condition but could increase the risk of developing it. To help interpret such scenarios, Forrest et al. used machine learning to evaluate several large cohorts of patients (see the Perspective by Raiken and Stein). The authors used clinical data to calculate disease scores for 10 conditions and then linked those scores with genetic data, including rare mutations with previously unknown roles. In addition to helping to interpret the contributions of rare mutations for the diseases analyzed here, this study provides a model for examining the contribution of unknown genetic variants to other diseases. —Yevgeniya Nusinovich
Structured Abstract
INTRODUCTION
Accurately estimating the penetrance of genetic variants—the probability that an individual with a variant develops disease—is essential for risk assessment and clinical decision-making. Traditional approaches rely on disease-enriched families or cohorts, which are limited by small sample sizes and ascertainment bias. Moreover, case-versus-control classifications oversimplify diseases that exist on a spectrum, further reducing the accuracy of risk estimates. Machine learning (ML) offers a scalable solution by integrating large-scale electronic health record (EHR) and genetic data to assess penetrance in a data-driven, quantitative, and precise manner.
RATIONALE
Sequencing advances have facilitated the discovery of rare variants in disease-associated genes, many of which are submitted to repositories such as ClinVar and classified based on predicted pathogenicity. However, reliance on laboratory and expert review as well as the absence of large variant datasets measuring real-world disease risk can lead to discordant variant classifications. Emerging population-based analyses have revealed that some variants previously classified as pathogenic (P) exhibit low or variable penetrance, whereas variants of uncertain significance (VUS) remain challenging to interpret clinically. To address these challenges, we developed an ML-based approach to estimate penetrance by leveraging routine clinical laboratory tests, which are widely available in health systems, and intersecting them with genetic data.
RESULTS
We constructed ML models for 10 genetic conditions—arrhythmogenic right ventricular cardiomyopathy, familial breast cancer, familial hypercholesterolemia (FH), hypertrophic cardiomyopathy (HCM), adult hypophosphatasia, long QT syndrome, Lynch syndrome, monogenic diabetes, polycystic kidney disease (PKD), and von Willebrand disease—using 1,347,298 participants with EHR data and applied them to an independent exome-sequenced cohort. Using disease probability scores from these models, we computed ML penetrance for 1648 rare variants across 31 autosomal dominant disease-predisposition genes, spanning P, benign (B), VUS, and previously unknown loss-of-function (LoF) variants. ML penetrance was highest for P and LoF variants, followed by VUS, and lowest for B variants, providing refined quantitative estimates compared with traditional case-versus-control methods.
Notably, ML penetrance correlated with disease-relevant clinical outcomes, such as risk of end-stage renal disease for PKD variants and heart failure for HCM variants. ML penetrance also aligned with experimentally derived measures of variant function, reinforcing its biological relevance. Importantly, ML penetrance aided in the evaluation of VUS and previously unknown LoF variants by delineating clinical trajectories—individuals with highly penetrant variants showed perturbed vital signs, electrocardiogram measures, and disease biomarkers over time. For example, individuals with highly ML penetrant FH variants exhibited 119 mg/dl higher low-density lipoprotein cholesterol and those with highly ML penetrant PKD variants had a 40 ml/min lower glomerular filtration rate.
CONCLUSION
This study presents an ML-based blueprint to systematically evaluate penetrance at scale, integrating genomic and clinical phenotype data. By providing refined, individualized disease risk estimates, ML penetrance has the potential to improve variant assessment, guide clinical decision-making, and enhance precision medicine approaches.
ML-based penetrance estimation of genetic variants.
Routine clinical measurements analyzed by ML generate continuous disease scores, which are integrated with genetic data to assess the penetrance of rare variants in disease-predisposition genes. ML-based penetrance correlates with an individual’s clinical outcomes and biomarkers and refines risk assessment of variants beyond traditional pathogenicity categories, presenting a scalable strategy for precision medicine. [Figure created with BioRender.com]
Abstract
Accurate variant penetrance estimation is crucial for precision medicine. We constructed machine learning (ML) models for 10 diseases using 1,347,298 participants with electronic health records, then applied them to an independent cohort with linked exome data. Resulting probabilities were used to evaluate ML penetrance of 1648 rare variants in 31 autosomal dominant disease-predisposition genes. ML penetrance was variable across variant classes, but highest for pathogenic and loss-of-function variants, and was associated with clinical outcomes and functional data. Compared with conventional case-versus-control approaches, ML penetrance provided refined quantitative estimates and aided the interpretation of variants of uncertain significance and loss-of-function variants by delineating clinical trajectories over time. By leveraging ML and deep phenotyping, we present a scalable approach to accurately quantify disease risk of variants.


