A new AI model named o3 has demonstrated significant advancements across several challenging benchmarks. It has successfully solved the AIME, GPQA, and Codeforces datasets, indicating strong capabilities in mathematics, question answering, and coding. Furthermore, o3 has achieved the equivalent of 11 years of progress in the ARC-AGI benchmark and made a 25% improvement in FrontierMath. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON New model performance on academic benchmarks.