New benchmark tests LLMs' ability to recover helpfulness after user clarifies intent

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have introduced CarryOnBench, a new benchmark designed to evaluate how well large language models can recover helpfulness in multi-turn conversations after a user clarifies their intent. The benchmark simulates over 5,900 conversations across 14 models, revealing that many models initially withhold information due to misinterpretation rather than lack of knowledge. While most models improve with clarification, some exhibit failure modes like utility lock-in or unsafe recovery, which are missed by single-turn evaluations. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Highlights a critical gap in LLM safety evaluations, suggesting current methods may overlook models that are unresponsive to user clarification.

RANK_REASON Academic paper introducing a new benchmark for LLM safety and utility.

Read on arXiv cs.CL →

paper
safety

COVERAGE [2]

arXiv cs.AI TIER_1 · Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap · 2026-05-01 04:00

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

arXiv:2604.27093v1 Announce Type: cross Abstract: Current LLM safety alignment techniques improve model robustness against adversarial attacks, but overlook whether and how LLMs can recover helpfulness when benign users clarify their intent. We introduce CarryOnBench, the first i…
arXiv cs.CL TIER_1 · Maarten Sap · 2026-04-29 18:37

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

Current LLM safety alignment techniques improve model robustness against adversarial attacks, but overlook whether and how LLMs can recover helpfulness when benign users clarify their intent. We introduce CarryOnBench, the first interactive benchmark that measures whether LLMs ca…

COVERAGE [2]

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

RELATED ENTITIES

RELATED TOPICS