Claude Opus 4.6 solves 10 Putnam math competition problems autonomously

By PulseAugur Editorial · [1 sources] · 2026-05-22 04:00

Researchers have demonstrated that Anthropic's Claude Opus 4.6, enhanced with specialized tools for the Rocq proof assistant, successfully proved 10 out of 12 problems from the 2025 Putnam Mathematical Competition. This experiment utilized a "compile-first, interactive-fallback" strategy implemented through Model Context Protocol (MCP) tools, which were developed by analyzing previous proof-assistant experiments. The AI agent operated autonomously on an isolated virtual machine, deploying 141 subagents over 17.7 hours of active computation and processing approximately 1.9 billion tokens. AI

IMPACT Demonstrates advanced AI reasoning capabilities on complex mathematical problems, potentially accelerating AI's role in formal verification and scientific discovery.

RANK_REASON Academic paper detailing an experiment with an AI model on a benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Claude Opus 4.6 solves 10 Putnam math competition problems autonomously

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Guillaume Baudart, Marc Lelarge, Tristan St\'erin, Jules Viennot · 2026-05-22 04:00

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

arXiv:2603.20405v2 Announce Type: replace-cross Abstract: We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competi…

COVERAGE [1]

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

RELATED ENTITIES

RELATED TOPICS