2025-05-07 マウントサイナイ医療システム (MSHS)
Comparison of monthly delirium detection rates before any ML-model deployment (pre-ML) and following deployment of the multimodal ML-delirium risk stratification model in live clinical practice (post-ML).
<関連情報>
- https://www.mountsinai.org/about/newsroom/2025/ai-model-improves-delirium-prediction-leading-to-better-health-outcomes-for-hospitalized-patients
- https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2833621
せん妄リスク層別化のための機械学習マルチモーダルモデル Machine Learning Multimodal Model for Delirium Risk Stratification
Joseph I. Friedman, MD; Prathamesh Parchure, MSC; Fu-Yuan Cheng, MS; et al
JAMA Network Open Published:May 7, 2025
DOI:10.1001/jamanetworkopen.2025.8874
Key Points
Question Can a machine learning model be used to accurately stratify risk of hospital delirium in live clinical practice?
Findings This quality improvement study including 32 284 inpatient admissions developed an automated multimodal machine learning delirium risk stratification model that demonstrated acceptable discriminative performance in live clinical practice. Additional analyses using 7023 admissions assessed for delirium with the Confusion Assessment Method showed that model deployment was associated with a significant 4-fold increase in delirium detection rates and significant reductions in daily doses of benzodiazepine and antipsychotic medications.
Meaning These findings suggest that a machine learning model may be used to automate delirium risk stratification in live clinical practice and may enhance delirium identification and care.
Abstract
Importance Automating the identification of risk for developing hospital delirium with models that use machine learning (ML) could facilitate more rapid prevention, identification, and treatment of delirium. However, there are very few reports on the performance of ML models for delirium risk stratification in live clinical practice.
Objective To report on development, operationalization, and validation of a multimodal ML model for delirium risk stratification in live clinical practice and its associations with workflow and clinical outcomes.
Design, Setting, and Participants This quality improvement study developed an ML model supported by automated electronic medical records to stratify the risk of non–intensive care unit delirium in live clinical practice using the Confusion Assessment Method as the diagnostic reference standard, with an iterative model update method. Data from patients aged at least 60 years admitted to non–intensive care units at Mount Sinai Hospital between January 2016 and January 2020 were used to train and test the ML model presented. The model was validated in live clinical practice from March 2023 to March 2024. Analysis of the model’s associations with workflow and clinical outcomes was conducted retrospectively in 2024, comparing hospitalized patients prior to deployment of any model version (pre-ML cohort) and during model clinical deployment (post-ML cohort).
Main Outcomes and Measures Outcomes of interest were area under the receiver operating characteristic curve, monthly delirium detection rates, median length of hospital stay, and daily doses of opiate, benzodiazepine, and antipsychotic medications administered.
Results The overall sample included 32 284 inpatient admissions (mean [SD] age, 73.56 (9.67) years, 15 157 [46.9%] women). A total of 25 261 inpatient admissions of older patients with both medical and surgical primary diagnoses represented the combined model testing and training cohort (median age, 73.37 [66.42-81.36] years) and live clinical deployment validation cohort (median [IQR] age, 72.11 [62.26-78.97] years), while 7023 inpatient admissions of older patients with both medical and surgical primary diagnoses represented the combined pre-ML (median [IQR] age, 74.00 [68.00-81.00] years) and post-ML (median [IQR] age, 75.33 [68.34-82.91] years) cohorts. The model presented is a fusion of electronic medical record patient data features and clinical note features processed by natural language processing. The results of model validation in live clinical practice included an area under the curve of 0.94 (95% CI, 0.93-0.95). Median (IQR) monthly delirium detection rates of inpatients assessed for delirium with the Confusion Assessment Method increased from 4.42% (95% CI, 3.70%-5.14%) in the pre-ML cohort to 17.17% (95% CI, 15.54%-18.80%) in the post-ML cohort (P < .001). Post-ML vs pre-ML cohorts received lower daily doses of benzodiazepines (median [IQR] 0.93 [0.42-2.28] diazepam dose equivalents vs 1.60 [0.66-4.27] diazepam dose equivalents; P < .001) and olanzapine (median [IQR], 1.09 [0.38-2.46] mg vs 2.50 [1.17-6.65] mg; P < .001).
Conclusions and Relevance This quality improvement study demonstrates the feasibility of a novel multimodal ML model to automate delirium risk stratification in live clinical practice. The model demonstrated acceptable performance in live clinical practice and may facilitate resource allocation to enhance delirium identification and care.