新基准TestEvo-Bench评估AI代理在代码和测试协同进化方面的能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 17:35

研究人员推出了TestEvo-Bench，一个旨在评估AI代理在代码变更协同进化测试方面能力的新基准。该基准包含生成新测试和更新现有测试的任务，这些任务基于真实的提交历史和可执行环境。TestEvo-Bench通过定期挖掘新任务来减少数据泄露，当前快照包含来自152个开源Java项目的1200多个任务。 AI

影响该基准有望提高AI代理理解和生成代码及测试的能力，从而带来更强大的软件开发工具。

排序理由该集群包含一篇介绍AI研究新基准的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie · 2026-07-03 04:00

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

arXiv:2607.02469v1 Announce Type: cross Abstract: Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code chang…
arXiv cs.AI TIER_1 English(EN) · Pengyu Nie · 2026-07-02 17:35

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code change, and rely on static metadata that does not verif…

报道来源 [2]

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

相关实体

相关话题