Process-Verified Reinforcement Learning for Theorem Proving via Lean
Researchers have developed a new reinforcement learning approach called Process-Verified Reinforcement Learning (PVRL) that leverages the Lean proof assistant to provide dense, structured feedback during training. This method uses Lean's ability to parse proof attempts into tactic sequences, offering fine-grained, verifier-grounded signals that go beyond simple binary success or failure. Experiments with STP-Lean and DeepSeek-Prover-V1.5 demonstrated that this tactic-level supervision improves performance on benchmarks like MiniF2F and ProofNet compared to outcome-only methods. The study suggests that symbolic proof assistants can function as process-level reward oracles, merging the scalability of language models with the reliability of symbolic verification for formal reasoning. AI
IMPACT This research could enhance the reliability and scalability of AI systems in formal reasoning tasks by combining language model capabilities with symbolic verification.