Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text

A new benchmark has been developed to evaluate how well large language models (LLMs) preserve diagnostic uncertainty in clinical text. Researchers found that current LLMs often fail to maintain the original level of uncertainty, sometimes preserving it less than half the time. The study highlights a critical failure mode for LLMs in clinical settings, as altering uncertainty expressions can significantly change clinical meaning and impact treatment decisions. AI

IMPACT Highlights a critical failure mode for LLMs in clinical workflows, impacting safe deployment and treatment decisions.

arXiv
large language models
pneumonia
Clinical text classification under the Open and Closed Topic Assumptions
Diagnostic uncertainty during the transition to secondary progressive multiple sclerosis