PulseAugur / Brief
EN
LIVE 09:15:24

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

    Researchers have developed AdvGRPO, a novel co-training framework designed to enhance the adaptive red teaming of language models. This method addresses the instability of GRPO in attacker-defender optimization by employing dense multi-channel rewards and decoupled advantage normalization. The training process follows a curriculum, starting with single-turn attacks and progressing to multi-turn scenarios before initiating co-training, ultimately producing more effective attacks and robust defenders. AI

    IMPACT Introduces a more stable and effective method for testing and improving AI safety by simulating adversarial attacks and defenses.