English(EN) An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Claude Opus 4.7 在代码重建基准测试中领先AI

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-26 17:24

Epoch AI 开发了 MirrorCode 基准测试，用于评估AI模型在没有原始代码的情况下重建完整程序的能力。Anthropic 的 Claude Opus 4.7 表现强劲，在14小时内成功重建了一个16000行的工具包，解决率为56%。然而，目前的AI模型在最复杂的编程任务上仍然面临挑战。 AI

影响该基准测试突显了当前AI在复杂代码生成方面的局限性，并为评估AI编程能力设定了新标准。

排序理由该集群描述了一个新的AI模型基准测试以及特定模型在该基准测试上的表现。[lever_c_demoted from research: ic=1 ai=1.0]

在 The Decoder 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

The Decoder TIER_1 English(EN) · Matthias Bastian · 2026-06-26 17:24

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

<p><img alt="" class="attachment-full size-full wp-post-image" height="768" src="https://the-decoder.com/wp-content/uploads/2026/06/llm_code_generation.png" style="height: auto; margin-bottom: 10px;" width="1376" /></p> <p> Epoch AI's new MirrorCode benchmark tests whether AI mod…

报道来源 [1]

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

相关实体

相关话题