Distilling LLM Feedback for Lean Theorem Proving
Researchers have developed a new training method called Feedback Distillation to improve the performance of large language models in complex reasoning tasks like theorem proving. This technique uses a language model to generate feedback, which is then used to provide token-level supervision for the model being trained. Experiments with the Lean4 theorem-proving environment show that Feedback Distillation leads to greater diversity in generated solutions and better scaling compared to traditional methods like GRPO, and can also serve as a strong initialization for GRPO. AI
IMPACT Introduces a novel training paradigm that enhances LLM capabilities in formal reasoning, potentially improving performance on complex symbolic tasks.