PulseAugur
EN
LIVE 12:15:59

New benchmark reveals LLMs struggle to balance safety and helpfulness in healthcare

A new benchmark, Health-ORSC-Bench, has been introduced to evaluate the safety alignment of large language models in healthcare contexts. The benchmark addresses the issue of over-refusal and unsafe compliance by focusing on "Safe Completion," which aims to provide helpful, high-level guidance without crossing into harmful territory. Evaluations of 30 LLMs, including models like GPT-5 and Claude 4, revealed that safety-optimized models often refuse a significant portion of benign queries, while domain-specific models may compromise safety for utility. The research indicates that larger frontier models tend to exhibit "safety-pessimism" and higher over-refusal rates compared to smaller or MoE-based models, highlighting the ongoing challenge in balancing refusal and compliance. AI

IMPACT This benchmark will drive development of more nuanced and reliable medical AI assistants by providing a standard for evaluating safety and helpfulness.

RANK_REASON The cluster is about a new academic paper introducing a benchmark for LLM safety in healthcare. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New benchmark reveals LLMs struggle to balance safety and helpfulness in healthcare

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Zhihao Zhang, Liting Huang, Guanghao Wu, Preslav Nakov, Heng Ji, Usman Naseem ·

    Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context

    arXiv:2601.17642v2 Announce Type: replace Abstract: Safety alignment in Large Language Models is critical for healthcare; however, reliance on binary refusal boundaries often results in over-refusal of benign queries or unsafe compliance with harmful ones. While existing benchmar…