Lean
PulseAugur coverage of Lean — every cluster mentioning Lean across labs, papers, and developer communities, ranked by signal.
15 day(s) with sentiment data
-
Neuralese training method may improve AI alignment via verifiable rewards
The concept of "Neuralese," a method for training AI models, is explored as a potentially beneficial approach for AI alignment. This method leverages Reinforcement Learning with Verifiable Rewards (RLVR) to optimize com…
-
New evaluation framework tests software security by varying implementations, not just AI models
This post proposes a multidimensional evaluation framework for assessing the security of software, particularly in the context of AI-assisted development. Instead of solely varying the AI model being tested, the author …
-
New LCS-Bench benchmark challenges AI models in theory-scale auto-formalization
Researchers have introduced LCS-Bench, a new benchmark designed to evaluate theory-scale auto-formalization in computer science logic. This benchmark, built using a semi-automated agentic pipeline, comprises 327 textboo…
-
AI assists mathematicians in translating proofs to formal languages
Mathematicians are beginning to use AI tools to translate complex mathematical proofs into computer-understandable formal languages, fulfilling a 12-year-old prediction by mathematician Terence Tao. This shift aims to e…
-
Lean Proof Assistant Enhances Reinforcement Learning for Theorem Proving
Researchers have developed a novel method for theorem proving using reinforcement learning, integrating the Lean proof assistant to provide detailed, verified feedback. This approach, termed Process-Verified Reinforceme…
-
Pramaana Labs raises $27M for AI formal verification
Pramaana Labs has secured $27 million in seed funding, led by Khosla Ventures, to develop AI systems with formal verification. The startup aims to enhance reliability in high-stakes sectors such as law, drug discovery, …
-
New AI Benchmark SorryDB Tests Real-World Math Formalization
Researchers have introduced SorryDB, a novel benchmark designed to evaluate AI's ability to complete real-world formalization tasks in the Lean mathematical proof assistant. Unlike static benchmarks, SorryDB is dynamica…
-
New BASE method cuts LLM math reasoning formalization costs by 5x
Researchers have developed a new method called BASE for improving the efficiency of answer selection in mathematical reasoning tasks using large language models (LLMs) and the formal proof assistant Lean. BASE reduces c…
-
New method converts formal math to natural language for AI proofs
A new paper introduces "Symbolic Informalization," a method for converting formal mathematics into human-readable natural language without losing precision. This technique is particularly useful for explaining proofs ge…
-
New LLM Frameworks and Benchmarks Advance Formal Mathematical Reasoning
Researchers are developing new methods and benchmarks to improve the formal mathematical reasoning capabilities of large language models (LLMs). One approach, Diffusion-Proof, utilizes diffusion LLMs (dLLMs) for theorem…
-
New system bridges math literature and formal proof libraries
Researchers have developed a novel bridge-database designed to connect mathematical literature with formal proof libraries. This system aims to unify access to published mathematical results and their formalizations, wh…
-
Pythagoras-Prover achieves state-of-the-art in efficient formal proving
Researchers have introduced Pythagoras-Prover, a new family of theorem provers designed for efficiency in formal reasoning tasks. These models utilize curriculum training and augmented formalization techniques to overco…
-
Trellis system uses LLM agents for rigorous mathematical proof formalization
Researchers have developed Trellis, an autoformalization system designed to assist in creating rigorous mathematical proofs. The system utilizes LLM agents within a structured workflow to refine natural language proofs …
-
New benchmark reveals AI struggles with verified code generation
A new benchmark called AlgoVeri has been developed to evaluate the performance of AI models in generating formally verified code for classical algorithms. The benchmark tests models across three languages: Dafny, Verus,…
-
Axiom AI solves 12 Putnam math exam problems, nears human score
Axiom, a seven-month-old startup, has achieved a significant milestone by solving 12 problems on the prestigious Putnam undergraduate math exam, scoring 8/12. This accomplishment places their AI system closer to top hum…
-
New framework ECP formally solves math answer-construction problems
Researchers have developed a new neuro-symbolic framework called Enumerate-Conjecture-Prove (ECP) designed to tackle answer-construction problems in formal mathematics. This framework combines general large language mod…
-
AI finds 50-year-old flaw in widely used economic theorem
Axiom Math's formal verification system, EconLib, has identified a flaw in a 50-year-old economic theorem by Robert Aumann, which has been widely used in fields like information economics and antitrust law. The AI syste…
-
AI models improve code generation with new verification techniques
Researchers have developed new methods to improve the ability of large language models to generate correct code and proofs. One approach, TTRL-CoCoV, uses confidence-conditioned verification to enhance coverage and accu…
-
LLMs optimized for efficient formal theorem proving in Lean
Two new research papers explore methods to improve the efficiency and effectiveness of large language models (LLMs) in formal theorem proving within the Lean environment. The first paper introduces an action routing age…
-
New FVSpec Benchmark Tests AI in Formal Software Verification
Researchers have introduced FVSpec, a new benchmark designed to evaluate AI models and agents in formal software verification tasks. The benchmark involves translating property-based tests from Python into specification…