English(EN) ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

新基准 ClinEnv 测试 LLM 作为模拟医生

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-01 17:56

研究人员推出 ClinEnv，这是一个新颖的交互式基准，旨在评估大型语言模型 (LLM) 在模拟临床环境中的表现。该环境向 LLM 提供真实住院患者的入院信息，要求它们扮演主治医生的角色，必须按顺序收集信息并在不确定性下做出不可逆的决定。与静态基准不同，ClinEnv 允许模型在每个阶段主动查询专业智能体，从而更真实地评估决策和信息收集过程。对七个模型的初步评估显示存在显著差距，表现最好的模型仅获得 0.31 的决策 F1 分数，凸显了临床推理和管理方面亟待改进。 AI

影响该基准可以加速开发更强大的 AI 智能体，以应对医疗保健等专业领域中复杂的、顺序性的决策任务。

排序理由这是一篇描述用于评估 LLM 的新基准环境的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.MA (Multiagent) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo, Xukai Zhao, Jinzhuo Wang, May Dongmei Wang · 2026-06-02 04:00

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

arXiv:2606.02568v1 Announce Type: new Abstract: Clinical practice is not the selection of an answer from enumerated options: a physician gathers heterogeneous information incrementally and commits to sequential, irreversible decisions under uncertainty. Static benchmarks cannot p…
arXiv cs.MA (Multiagent) TIER_1 English(EN) · May Dongmei Wang · 2026-06-01 17:56

ClinEnv：面向智能体的交互式多阶段长时程电子健康记录环境

Clinical practice is not the selection of an answer from enumerated options: a physician gathers heterogeneous information incrementally and commits to sequential, irreversible decisions under uncertainty. Static benchmarks cannot probe and existing interactive medical benchmarks…

报道来源 [2]

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

ClinEnv：面向智能体的交互式多阶段长时程电子健康记录环境

相关实体

相关话题