Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance
Researchers have developed PhysAssistBench, a new benchmark designed to evaluate the capabilities of Large Language Models (LLMs) in assisting physicians. This benchmark focuses on interactive scenarios involving doctors, patients, and electronic health records (EHRs), requiring LLMs to coordinate clinical knowledge, communication, and system interaction. Built using real cases from MIMIC-IV, PhysAssistBench includes a dataset of 1,296 physician-validated turns. Initial experiments indicate that current LLMs are not yet reliable enough for this complex clinical assistance role, highlighting the need for better integration of these diverse capabilities. AI
IMPACT Highlights a critical gap in LLM capabilities for real-world clinical assistance, indicating further development is needed for reliable integration into healthcare workflows.