PulseAugur
实时 03:50:05
English(EN) ImProver: Agent-Based Automated Proof Optimization

AI代理在程序验证和定理证明方面展现出潜力

研究人员正在探索使用基于代理的AI系统,特别是那些利用大型语言模型(LLMs)的系统,来处理程序验证和数学定理证明等复杂任务。研究表明,这些系统在生成有效规范和认证代码方面取得了很高的成功率,有时在新基准测试中表现优于专用模型。然而,研究也突显了当前AI能力与现有验证基准的严谨性之间日益扩大的差距,表明需要更稳健的评估方法。 AI

影响 基于代理的AI系统正在展示形式化验证方面的先进能力,有可能加速复杂软件和数学证明的开发和可靠性。

排序理由 arXiv上发表了多篇研究论文,详细介绍了用于程序验证和定理证明的新型基于代理的AI框架。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Alessandro Sosso, Akhil Arora, Bas Spitters ·

    Agentic Proving for Program Verification

    arXiv:2605.23772v1 Announce Type: new Abstract: Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic pro…

  2. arXiv cs.AI TIER_1 English(EN) · Benjamin Breen, Marco Del Tredici, Jacob McCarran, Javier Aspuru Mijares, Weichen Winston Yin, Kfir Sulimany, Jacob M. Taylor, Frank H. L. Koppens, Dirk Englund ·

    Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

    arXiv:2510.12787v4 Announce Type: replace Abstract: We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, A…

  3. arXiv cs.AI TIER_1 English(EN) · Bas Spitters ·

    Agentic Proving for Program Verification

    Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for…

  4. arXiv cs.CL TIER_1 English(EN) · Riyaz Ahuja, Jeremy Avigad, Prasad Tetali, Sean Welleck ·

    ImProver: Agent-Based Automated Proof Optimization

    arXiv:2410.04753v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, dependin…