PulseAugur / Brief
EN
LIVE 19:32:55

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Introducing: DNR-Bench: Do-not-respond Benchmark

    A new benchmark called DNR-Bench has been introduced to evaluate large language models' ability to avoid responding to specific prompts. Across several leading models including GPT-5.1, Claude Opus 4.8, Gemini 3 Pro, and Grok 4, the benchmark reported a 0.0% pass rate, indicating that none of the tested models successfully refrained from generating any output when presented with the test prompt. The benchmark's methodology and code are available on GitHub. AI

    Introducing: DNR-Bench: Do-not-respond Benchmark

    IMPACT This benchmark highlights a critical safety failure in current LLMs, suggesting a need for improved alignment and refusal capabilities.