English(EN) AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

新框架使用电子健康记录数据评估临床咨询中的大语言模型

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-16 03:35

研究人员开发了AIPatient Arena，一个用于评估临床咨询环境中大语言模型（LLMs）的新框架。该框架使用电子健康记录（EHRs）来模拟真实的、多轮的医患互动。虽然大语言模型在提问技巧、道德行为和解释方面表现出优势，但在信息整合、用药安全、处理歧义、信息覆盖和诊断准确性方面存在不足。 AI

影响强调了在医疗保健领域对大语言模型进行全面评估的必要性，超越了简单的准确性，侧重于互动和推理。

排序理由该集群包含一篇学术论文，详细介绍了在特定领域中大语言模型的新评估框架。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Jiahui Niu, Huizi Yu, Wenkong Wang, Guangxin Dai, Jingxian He, Xiang Li, Zhiying Liang, Xinxin Lin, Kent CY So, Bryan YP Yan, Yun Kwok Wing, Yanqiu Xing, Xin Ma, Lizhou Fan · 2026-06-17 04:00

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

arXiv:2606.17474v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly considered for use in clinical consultation tasks, yet most medical evaluations remain static, single-turn, or narrowly outcome-based, limiting their ability to reflect the sequential,…
arXiv cs.CL TIER_1 English(EN) · Lizhou Fan · 2026-06-16 03:35

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

Large language models (LLMs) are increasingly considered for use in clinical consultation tasks, yet most medical evaluations remain static, single-turn, or narrowly outcome-based, limiting their ability to reflect the sequential, uncertain, and interactive nature of real-world c…