Introducing: DNR-Bench: Do-not-respond Benchmark
A new benchmark called DNR-Bench has been introduced to evaluate large language models' ability to avoid responding to specific prompts. Across several leading models including GPT-5.1, Claude Opus 4.8, Gemini 3 Pro, and Grok 4, the benchmark reported a 0.0% pass rate, indicating that none of the tested models successfully refrained from generating any output when presented with the test prompt. The benchmark's methodology and code are available on GitHub. AI
IMPACT This benchmark highlights a critical safety failure in current LLMs, suggesting a need for improved alignment and refusal capabilities.