Researchers have demonstrated that Anthropic's Claude Opus 4.6, enhanced with specialized tools for the Rocq proof assistant, successfully proved 10 out of 12 problems from the 2025 Putnam Mathematical Competition. This experiment utilized a "compile-first, interactive-fallback" strategy implemented through Model Context Protocol (MCP) tools, which were developed by analyzing previous proof-assistant experiments. The AI agent operated autonomously on an isolated virtual machine, deploying 141 subagents over 17.7 hours of active computation and processing approximately 1.9 billion tokens. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Demonstrates advanced AI reasoning capabilities on complex mathematical problems, potentially accelerating AI's role in formal verification and scientific discovery.
RANK_REASON Academic paper detailing an experiment with an AI model on a benchmark. [lever_c_demoted from research: ic=1 ai=1.0]