PulseAugur / Brief
EN
LIVE 10:46:52

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems

    Researchers have developed NRT-Bench, a new benchmark designed to test the safety and robustness of large language model (LLM) agents in critical systems. The benchmark simulates a nuclear power plant control room where LLM agents act as operators, facing multi-turn adversarial attacks. Evaluations showed that adaptive attacks could cause safety failures in 8.7% to 12.1% of sessions across four frontier models, highlighting vulnerabilities that are largely disjoint between models. The study also found that defensive measures can have unpredictable, model-dependent effects on attack success rates. AI

    LLM agent safety, multi-turn red-teaming, jailbreak benchmarks, adversarial robustness, safety-critical systems

    IMPACT Highlights the need for robust safety evaluations of LLM agents in critical systems and reveals model-dependent vulnerabilities.