English(EN) Evaluation of LLMs for Mathematical Formalization in Lean

LLM在Lean 4中形式化数学证明的评估

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-06 04:00

一篇新的研究论文评估了各种大型语言模型（LLM）在使用Lean 4定理证明器生成形式化数学证明方面的性能。该研究在miniF2F和miniCTX数据集的子集上采用了pass@k和refine@k指标。Gemini 3.1 Pro和Claude Opus 4.7表现出最高的成功率，其中Gemini在miniF2F上达到92%，Opus在miniCTX上达到86%。在成本效益方面，NVIDIA Nemotron 3 Super和GPT-OSS 120B以较低的每证明成本提供了具有竞争力的准确性。 AI

影响这项研究突显了LLM在形式化数学方面的能力，可能有助于定理证明和数学研究。

排序理由该集群包含一篇评估LLM在特定任务上性能的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Tyson Klingner, Drew Bladek, Escher Crawford, Bohao Chen, Ariel Fu, Kaira Nair, Jarod Alper, Giovanni Inchiostro, Vasily Ilin · 2026-06-06 04:00

LLM在Lean中进行数学形式化评估

arXiv:2606.05632v1 Announce Type: new Abstract: Within the past few years, the ability of Large Language Models (LLMs) to generate formal mathematical proofs has improved drastically. We provide a comparison of various LLMs' effectiveness in producing formal proofs in Lean 4 with…

报道来源 [1]

LLM在Lean中进行数学形式化评估

相关实体

相关话题