English(EN) MirrorCode: AI can rebuild entire programs from behavior alone

MirrorCode基准测试AI仅凭行为重建软件的能力 · 已追踪2个来源

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-29 11:57

研究人员推出了MirrorCode，这是一个新的基准测试，旨在评估AI在仅从观察到的行为中重建整个软件项目（无需访问原始源代码）的能力。该基准测试包含25个多样化的目标程序，包括Unix实用程序和生物信息学工具，要求AI代理在各种测试中精确匹配原始程序的输出。目前的AI模型在MirrorCode上已能达到56%的准确率，展示了它们在长时程软件工程任务中的能力，例如重新实现一个名为gotree的16000行生物信息学工具包。MirrorCode的开发表明，随着自主代理的不断进步，AI将极大地改变软件工程。 AI

影响该基准测试有望加速AI在自主编码和软件工程领域的发展。

排序理由该集群描述了一个用于评估AI在软件工程领域能力的新的基准测试和研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Tom Adamczewski, David Owen, David Rein, Florian Brand, Giles Edkins, Allen Hart, Daniel O'Connell · 2026-06-30 04:00

MirrorCode：AI仅凭行为即可重建整个程序

arXiv:2606.30182v1 Announce Type: new Abstract: AI models are rapidly improving at autonomous coding, as shown by benchmark progress and one-off demonstrations such as AI implementing a C compiler. However, existing coding benchmarks tend to focus on shorter tasks, and one-off de…
arXiv cs.AI TIER_1 English(EN) · Daniel O'Connell · 2026-06-29 11:57

MirrorCode：AI仅凭行为即可重建整个程序

AI models are rapidly improving at autonomous coding, as shown by benchmark progress and one-off demonstrations such as AI implementing a C compiler. However, existing coding benchmarks tend to focus on shorter tasks, and one-off demonstrations are hard to compare systematically …

报道来源 [2]

MirrorCode：AI仅凭行为即可重建整个程序

MirrorCode：AI仅凭行为即可重建整个程序

相关实体

相关话题