Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 1d

Distilling LLM Feedback for Lean Theorem Proving

Researchers have developed a new training method called Feedback Distillation to improve the performance of large language models in complex reasoning tasks like theorem proving. This technique uses a language model to generate feedback, which is then used to provide token-level supervision for the model being trained. Experiments with the Lean4 theorem-proving environment show that Feedback Distillation leads to greater diversity in generated solutions and better scaling compared to traditional methods like GRPO, and can also serve as a strong initialization for GRPO. AI

IMPACT Introduces a novel training paradigm that enhances LLM capabilities in formal reasoning, potentially improving performance on complex symbolic tasks.

GRPO
Lean4
Feedback Distillation