OpenAI's new GPT-5.5 model has reportedly outperformed Anthropic's Claude Fable 5 on the challenging Agents' Last Exam benchmark. This result suggests a significant advancement in AI agent capabilities, potentially shifting the competitive landscape. AI
IMPACT Sets a new performance bar for AI agents, potentially influencing future development and evaluation methodologies.
RANK_REASON New model release from a frontier lab with benchmark results. [lever_c_demoted from frontier_release: ic=2 ai=1.0]
Read on Mastodon — sigmoid.social →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →