肝臓がん治療の予測にLLMが有望(Large Language Models Show Promise in Predicting Liver Cancer Treatment Outcomes)

ad

2025-06-12 中国科学院(CAS)

肝臓がん治療の予測にLLMが有望(Large Language Models Show Promise in Predicting Liver Cancer Treatment Outcomes)

Comparative study process of LLMs and doctors in predicting immune therapy response for liver cancer (Image by WANG Tengfei)

中国科学院・合肥物質科学研究院のLI Hai教授率いる研究チームは、肝細胞がん(HCC)の治療効果を大規模言語モデル(LLM)で予測する初の体系的研究を実施した。GPT-4、GPT-4o、Gemini、DeepSeekを用いたゼロショット学習により、186人の手術不能な患者データを解析。新たに開発したGemini-GPTモデルは、ベテラン医師と同等の精度を示し、若手医師より迅速かつ正確に予測を行った。AIは治療効果のある患者の同定に特に有効で、医療現場での活用が期待される。

<関連情報>

切除不能肝細胞癌における免疫療法の奏効予測:大規模言語モデルとヒト専門家の比較研究 Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts

Jun Xu,Junjie Wang,Junjun Li,Zhangxiang Zhu,Xiao Fu,Wei Cai,Ruipeng Song,Tengfei Wang & Hai Li
Journal of Medical Systems  Published:15 May 2025
DOI:https://doi.org/10.1007/s10916-025-02192-1

Abstract

Hepatocellular carcinoma (HCC) is an aggressive cancer with limited biomarkers for predicting immunotherapy response. Recent advancements in large language models (LLMs) like GPT-4, GPT-4o, and Gemini offer the potential for enhancing clinical decision-making through multimodal data analysis. However, their effectiveness in predicting immunotherapy response, especially compared to human experts, remains unclear. This study assessed the performance of GPT-4, GPT-4o, and Gemini in predicting immunotherapy response in unresectable HCC, compared to radiologists and oncologists of varying expertise. A retrospective analysis of 186 patients with unresectable HCC utilized multimodal data (clinical and CT images). LLMs were evaluated with zero-shot prompting and two strategies: the ‘voting method’ and the ‘OR rule method’ for improved sensitivity. Performance metrics included accuracy, sensitivity, area under the curve (AUC), and agreement across LLMs and physicians.GPT-4o, using the ‘OR rule method,’ achieved 65% accuracy and 47% sensitivity, comparable to intermediate physicians but lower than senior physicians (accuracy: 72%, p = 0.045; sensitivity: 70%, p < 0.0001). Gemini-GPT, combining GPT-4, GPT-4o, and Gemini, achieved an AUC of 0.69, similar to senior physicians (AUC: 0.72, p = 0.35), with 68% accuracy, outperforming junior and intermediate physicians while remaining comparable to senior physicians (p = 0.78). However, its sensitivity (58%) was lower than senior physicians (p = 0.0097). LLMs demonstrated higher inter-model agreement (κ = 0.59–0.70) than inter-physician agreement, especially among junior physicians (κ = 0.15). This study highlights the potential of LLMs, particularly Gemini-GPT, as valuable tools in predicting immunotherapy response for HCC.

医療・健康
ad
ad
Follow
ad
タイトルとURLをコピーしました