AI医療要約ツールの効率性と信頼性の向上(Improving efficiency, reliability of AI medical summarization tools)


2024-02-21 ペンシルベニア州立大学(PennState)

ペンシルバニア州立大学の研究者らは、医療文書の要約に人工知能(AI)を使用する医療要約化プロセスが、現在、電子健康記録の作成や保険請求処理のために医療現場で使用されている。しかし、これらの実践は労力がかかるため、より効率的な方法を模索していた。彼らは、医療要約を生成するために使用される自然言語処理(NLP)モデルのトレーニングを微調整するためのフレームワークを導入し、信頼性の高い結果を効率的に生み出す方法を提案した。この方法はFaithfulness for Medical Summarization(FaMeSumm)フレームワークと呼ばれ、既存のAIモデルが生成した医療要約の信頼性を向上させることを可能にした。


FaMeSumm 医学的要約の忠実性の調査と改善 FaMeSumm: Investigating and Improving Faithfulness of Medical Summarization

Nan Zhang, Yusen Zhang, Wu Guo, Prasenjit Mitra, Rui Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing


Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FaMeSumm, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FaMeSumm performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FaMeSumm is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FaMeSumm generates more faithful outputs. Our code is available at