OpenAI's GPT-5.5 has outperformed Anthropic's Claude Fable 5 on a new AI benchmark called Agents Last Exam (ALE). This benchmark, developed by Berkeley RDI with input from over 300 experts, tests autonomous AI agents. The result is surprising, as Claude Fable 5 was previously considered the leading model for such tasks. AI
IMPACT Sets a new performance standard for AI agents, potentially shifting the competitive landscape and influencing future development priorities.
RANK_REASON New model version (GPT-5.5) release with benchmark performance data. [lever_c_demoted from frontier_release: ic=1 ai=1.0]
Read on Mastodon — mastodon.social →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →