医学におけるAIの公平性について(Is AI in Medicine Playing Fair?)

ad

2025-04-07 マウントサイナイ医療システム (MSHS)

マウントサイナイ医科大学の研究者たちは、生成系人工知能(AI)モデルが患者の社会経済的および人口統計学的背景に基づいて異なる治療推奨を行う可能性があることを明らかにしました。この研究では、複数の大規模言語モデル(LLM)を用いて、同一の医療状況に対する治療推奨が患者の年齢、人種、性別、保険状況によって変化するかを評価しました。その結果、特定の患者グループに対して一貫性のない推奨が行われることが確認され、AIが医療において公平性の課題を抱えていることが示唆されました。研究者たちは、AIモデルの設計と実装において厳格な検証と監視が必要であり、すべての患者に対して安全で効果的、かつ公平な医療を提供するための対策が求められると強調しています。

<関連情報>

大規模言語モデルによる医療意思決定における社会人口統計学的バイアス Sociodemographic biases in medical decision making by large language models

Mahmud Omar,Shelly Soffer,Reem Agbareia,Nicola Luigi Bragazzi,Donald U. Apakama,Carol R. Horowitz,Alexander W. Charney,Robert Freeman,Benjamin Kummer,Benjamin S. Glicksberg,Girish N. Nadkarni & Eyal Klang
Nature Medicine  Published:07 April 2025
DOI:https://doi.org/10.1038/s41591-025-03626-6

医学におけるAIの公平性について(Is AI in Medicine Playing Fair?)

Abstract

Large language models (LLMs) show promise in healthcare, but concerns remain that they may produce medically unjustified clinical care recommendations reflecting the influence of patients’ sociodemographic characteristics. We evaluated nine LLMs, analyzing over 1.7 million model-generated outputs from 1,000 emergency department cases (500 real and 500 synthetic). Each case was presented in 32 variations (31 sociodemographic groups plus a control) while holding clinical details constant. Compared to both a physician-derived baseline and each model’s own control case without sociodemographic identifiers, cases labeled as Black or unhoused or identifying as LGBTQIA+ were more frequently directed toward urgent care, invasive interventions or mental health evaluations. For example, certain cases labeled as being from LGBTQIA+ subgroups were recommended mental health assessments approximately six to seven times more often than clinically indicated. Similarly, cases labeled as having high-income status received significantly more recommendations (P < 0.001) for advanced imaging tests such as computed tomography and magnetic resonance imaging, while low- and middle-income-labeled cases were often limited to basic or no further testing. After applying multiple-hypothesis corrections, these key differences persisted. Their magnitude was not supported by clinical reasoning or guidelines, suggesting that they may reflect model-driven bias, which could eventually lead to health disparities rather than acceptable clinical variation. Our findings, observed in both proprietary and open-source models, underscore the need for robust bias evaluation and mitigation strategies to ensure that LLM-driven medical advice remains equitable and patient centered.

医療・健康
ad
ad
Follow
ad
タイトルとURLをコピーしました