PulseAugur
实时 23:12:58
English(EN) Can Coding Agents Reproduce Findings in Computational Materials Science?

编码代理能否重现计算材料科学的发现?

研究人员开发了AutoMat,这是一个旨在测试AI编码代理在重现计算材料科学论文发现方面能力的新基准。该基准评估代理重建复杂科学工作流、导航专用工具链以及解释结果以支持或反驳科学主张的能力。目前基于LLM的代理成功率很低,表现最好的设置仅达到54.1%,这凸显了它们在处理不完整程序和方法偏差方面的局限性。 AI

影响 强调了AI代理在科学可重复性方面的当前局限性,表明需要改进特定领域的推理和工作流重建。

排序理由 该集群包含一篇介绍AI代理评估新基准的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

编码代理能否重现计算材料科学的发现?

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Dani ·

    Can Coding Agents Reproduce Findings in Computational Materials Science?

    arXiv:2605.00803v1 Announce Type: cross Abstract: Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational…

  2. arXiv cs.CL TIER_1 English(EN) · Daniel Khashabi ·

    Can Coding Agents Reproduce Findings in Computational Materials Science?

    Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not onl…