PulseAugur / Brief
EN
LIVE 10:39:26

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. "The Last Test of the Agent", Fable 5 is surprisingly defeated by GPT 5.5

    A new benchmark called Agents' Last Exam (ALE), developed by researchers from UC Berkeley and other institutions, has revealed surprising results in AI agent performance. In the most challenging tasks, leading models like Anthropic's Claude Fable 5 and OpenAI's GPT 5.5 scored zero, indicating significant limitations in handling complex, real-world tasks. When tested on slightly less difficult tasks, GPT 5.5 outperformed Claude Fable 5, a reversal of previous benchmark results. AI

    IMPACT This benchmark highlights the gap between theoretical performance and practical application for AI agents, suggesting current models struggle with complex, real-world tasks despite strong performance on traditional benchmarks.