English(EN) Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification

新的EVA方法改进了用于数学验证的LLM奖励建模

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-02 04:00

研究人员推出了一种名为期望值对齐（EVA）的新程序，用于训练与大型语言模型在形式化数学验证中使用的奖励模型。EVA通过从模型的token分布中提取连续分数，同时保留离散的文本理由，来解决现有模型中的权衡问题。该方法在名为Leibniz的模型中实现，用于Lean 4形式化验证，与基线方法相比，显示出更少的离散化伪影。 AI

影响这种新方法可以提高用于形式化数学推理的AI系统的准确性和可解释性。

排序理由该集群包含一篇详细介绍生成奖励建模新方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Shihao Ji, Haotao Tan, Zihui Song, Mingyu Li · 2026-06-02 04:00

Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification

arXiv:2606.01160v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used with formal interactive theorem provers such as Lean 4. Scaling these systems with reinforcement learning or search methods requires process reward models (PRMs) that can evaluate i…

报道来源 [1]

Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification

相关实体

相关话题