PulseAugur / Brief
EN
LIVE 10:18:52

Brief

last 24h
[1/1] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Do LLMs Hold Their Values? MANTA: A Multi-Turn Adversarial Benchmark for Animal Welfare Reasoning

    Researchers have developed MANTA, a new benchmark designed to evaluate how well large language models maintain their ethical stances on animal welfare during multi-turn adversarial conversations. The benchmark consists of 1,088 five-turn dialogues that test both value stability and moral sensitivity. When tested on seven frontier models, including Claude Opus 4.7 and GPT-5.5, MANTA revealed that some models' performance rankings shifted significantly under sustained pressure, indicating a potential degradation of their alignment. AI

    IMPACT This benchmark could reveal vulnerabilities in LLM alignment, prompting developers to improve robustness against adversarial pressure in sensitive ethical domains.