2023-07-28 カリフォルニア大学リバーサイド校(UCR)
◆両者の特長を組み合わせることで、より良いシステムを構築する可能性が示されました。認知症などの重篤な疾患に関するクエリを検証した結果、Googleは最新情報を提供するが信頼性に問題があり、ChatGPTは信頼性が高い情報を提供するが情報源が古く、参照元が欠けています。両者とも読みやすさに課題があり、低い教育レベルや保健リテラシースキルの人には理解が難しいとの指摘もあります。
◆今後の改善点として、情報の出典と日付の明示、他の言語での提供が追加されることで、プラットフォームの価値が向上する可能性が示唆されました。
<関連情報>
- https://news.ucr.edu/articles/2023/07/28/google-chatgpt-have-mixed-results-medical-info-queries
- https://www.jmir.org/2023/1/e48966
認知症やその他の認知機能低下に関するクエリにおけるChatGPTとGoogleの比較: 結果の比較 ChatGPT vs Google for Queries Related to Dementia and Other Cognitive Decline: Comparison of Results
Vagelis Hristidis, Nicole Ruggiano, Ellen L Brown, Sai Rithesh Reddy Ganta, Selena Stewart
Journal of Medical Internet Research Published May 16, 2023
DOI:https://preprints.jmir.org/preprint/48966
Abstract
Background:People living with dementia or other cognitive decline and their caregivers (PLWD) increasingly rely on the web to find information about their condition and available resources and services. The recent advancements in large language models (LLMs), such as ChatGPT, provide a new alternative to the more traditional web search engines, such as Google.
Objective:This study compared the quality of the results of ChatGPT and Google for a collection of PLWD-related queries.
Methods:A set of 30 informational and 30 service delivery (transactional) PLWD-related queries were selected and submitted to both Google and ChatGPT. Three domain experts assessed the results for their currency of information, reliability of the source, objectivity, relevance to the query, and similarity of their response. The readability of the results was also analyzed. Interrater reliability coefficients were calculated for all outcomes.
Results:Google had superior currency and higher reliability. ChatGPT results were evaluated as more objective. ChatGPT had a significantly higher response relevance, while Google often drew upon sources that were referral services for dementia care or service providers themselves. The readability was low for both platforms, especially for ChatGPT (mean grade level 12.17, SD 1.94) compared to Google (mean grade level 9.86, SD 3.47). The similarity between the content of ChatGPT and Google responses was rated as high for 13 (21.7%) responses, medium for 16 (26.7%) responses, and low for 31 (51.6%) responses.
Conclusions:Both Google and ChatGPT have strengths and weaknesses. ChatGPT rarely includes the source of a result. Google more often provides a date for and a known reliable source of the response compared to ChatGPT, whereas ChatGPT supplies more relevant responses to queries. The results of ChatGPT may be out of date and often do not specify a validity time stamp. Google sometimes returns results based on commercial entities. The readability scores for both indicate that responses are often not appropriate for persons with low health literacy skills. In the future, the addition of both the source and the date of health-related information and availability in other languages may increase the value of these platforms for both nonmedical and medical professionals.