English(EN) Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

新基准揭示大型语言模型在临床文本中难以处理诊断不确定性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-16 20:30

一项新的基准测试已被开发出来，用于评估大型语言模型（LLMs）在临床文本中保留诊断不确定性的能力。研究发现，当前的LLMs往往无法维持原有的不确定性水平，有时保留不确定性的次数不到一半。该研究强调了LLMs在临床环境中一种关键的失效模式，因为改变不确定性表达会显著改变临床意义并影响治疗决策。 AI

影响强调了LLMs在临床工作流程中的一种关键失效模式，影响安全部署和治疗决策。

排序理由该集群包含一篇详细介绍新基准和LLM评估的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Hongbo Du, Zixin Lu, Jiaming Qu · 2026-06-18 04:00

Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

arXiv:2606.18471v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for clinical text tasks such as summarization and revision. While most studies evaluate the fluency and coherence of LLM-generated text, whether LLMs correctly preserve diagnostic u…
arXiv cs.CL TIER_1 English(EN) · Jiaming Qu · 2026-06-16 20:30

Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

Large language models (LLMs) are increasingly used for clinical text tasks such as summarization and revision. While most studies evaluate the fluency and coherence of LLM-generated text, whether LLMs correctly preserve diagnostic uncertainty remains underexplored. In clinical pr…