"The Last Test of the Agent", Fable 5 is surprisingly defeated by GPT 5.5
A new benchmark called Agents' Last Exam (ALE), developed by researchers from UC Berkeley and other institutions, has revealed surprising results in AI agent performance. In the most challenging tasks, leading models like Anthropic's Claude Fable 5 and OpenAI's GPT 5.5 scored zero, indicating significant limitations in handling complex, real-world tasks. When tested on slightly less difficult tasks, GPT 5.5 outperformed Claude Fable 5, a reversal of previous benchmark results. AI
IMPACT This benchmark highlights the gap between theoretical performance and practical application for AI agents, suggesting current models struggle with complex, real-world tasks despite strong performance on traditional benchmarks.