人間とAIの協働で最高精度の医療診断を実現(Human-AI collectives make the most accurate medical diagnoses)

ad

2025-06-20 マックス・プランク研究所

マックス・プランク研究所などの国際研究チームは、人間とAIを組み合わせた「ハイブリッド集団」による診断が、最も高い正確性を示すことを発見しました。2133件の臨床ケースに対し、医師のみ、AIのみ(ChatGPT-4など5種)、および混合チームが診断を実施。その結果、AI単独でも医師の85%を上回る精度を示しましたが、ヒト+AIの混合集団ではさらに高精度を達成しました。これは、人とAIが異なるタイプの誤りを補完し合うことで、全体の精度が向上するためと考えられます。本研究は、AIを医師の補佐として活用することで、診断の正確性と公平性を高める新たな可能性を示しています。

<関連情報>

人間とAIの共同作業が臨床症状を最も正確に診断 Human–AI collectives most accurately diagnose clinical vignettes

Nikolas Zöller, Julian Berger, Irving Lin, +9 , and Stefan M. Herzog
Proceedings of the National Academy of Sciences  Published:June 13, 2025
DOI:https://doi.org/10.1073/pnas.2426153122

人間とAIの協働で最高精度の医療診断を実現(Human-AI collectives make the most accurate medical diagnoses)

Significance

Large language models (LLMs) have great potential for high-stakes applications such as medical diagnostics but face challenges including hallucinations, biases, and lack of common sense. We address these limitations through a hybrid human–AI system that combines physicians’ expertise with LLMs to generate accurate differential medical diagnoses. Analyzing over 2,000 text-based medical case vignettes, hybrid collectives outperform individual physicians, standalone LLMs, and groups composed solely of physicians or LLMs, by leveraging complementary strengths while mitigating their distinct weaknesses. Our findings underscore the transformative potential of human–AI collaboration to enhance decision-making in complex, open-ended domains, paving the way for safer, more equitable applications of AI in medicine and beyond.

Abstract

AI systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased—shortcomings that may reflect LLMs’ inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here, we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 text-based medical case vignettes. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience and can be attributed to humans’ and LLMs’ complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.

医療・健康
ad
ad
Follow
ad
タイトルとURLをコピーしました