Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

Researchers have introduced SorryDB, a novel benchmark designed to evaluate AI's ability to complete real-world formalization tasks in the Lean mathematical proof assistant. Unlike static benchmarks, SorryDB is dynamically updated with open tasks from GitHub projects, aiming to produce AI tools that are more aligned with community needs and capable of handling complex dependencies. Initial evaluations show that while an agentic approach using Gemini Flash performs best, it is not strictly superior to other large language models, specialized provers, or curated Lean tactics, suggesting a complementary nature among current AI approaches for formal mathematics. AI

IMPACT This benchmark could accelerate the development of AI agents capable of contributing to formal mathematics and complex dependency reasoning.

GitHub
arXiv
Lean
Gemini Flash
SorryDB
Austin Letson