2025-09-25 マウントサイナイ医療システム (MSHS)
<関連情報>
- https://www.mountsinai.org/about/newsroom/2025/adding-a-lookup-step-makes-ai-better-at-assigning-medical-diagnosis-codes
- https://ai.nejm.org/doi/full/10.1056/AIcs2401161
医療コーディングのための検索強化型大規模言語モデルの評価 Assessing Retrieval-Augmented Large Language Models for Medical Coding
Eyal Klang, M.D., Idit Tessler, M.D., Ph.D., Donald U. Apakama, M.D., Ethan Abbott, M.D., Benjamin S. Glicksberg, M.D., Monique Arnold, M.D., Akini Moses, M.D., +8 , and Girish N. Nadkarni, M.D.
NEJM AI Published: September 25, 2025
DOI: 10.1056/AIcs2401161
Abstract
Accurate medical coding is vital for clinical and administrative purposes, but it is often complex and time-consuming. Large language models (LLMs) often struggle with medical coding, producing inaccuracies and hallucinations. We aimed to enhance LLM medical coding by integrating retrieval-augmented generation (RAG). We studied 500 randomly selected emergency department visits from the Mount Sinai Health System. Nine LLMs, both commercial and open-source, were evaluated for primary diagnosis coding according to the International Classification of Diseases, Tenth Revision, Clinical Modification. A RAG system enhanced LLM predictions using data from over 1 million emergency department visits. We compared RAG-enhanced codes with provider-assigned codes. A masked review by four physicians and two LLMs determined which codes — LLM- or provider-assigned — were more accurate and specific. RAG-enhanced LLMs demonstrated superior accuracy and specificity compared with provider-assigned codes. Human reviewers favored RAG-enhanced Generative Pretrained Transformer 4 (GPT-4) for accuracy in 447 instances, versus 277 instances for provider-assigned codes (P<0.001). For specificity, RAG-enhanced GPT-4 was preferred in 509 cases, compared with 181 for provider-assigned codes (P<0.001). Smaller open-access models also showed significant improvement with RAG, demonstrating that integrating RAG with LLM medical coding may reduce errors and improve clinical documentation. (Funded by the National Center for Advancing Translational Sciences and the Office of Research Infrastructure Programs of the National Institutes of Health.)


