English(EN) I created a new benchmark and it interestingly showed the regression from Opus 4.6 -> 4.7

Anthropic 的 Opus 4.7 在新的用户创建的基准测试中显示回归

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-27 20:16

一个用户创建的基准测试 ObviousBench 揭示了 Anthropic 的 Opus 4.7 模型与其前身 Opus 4.6 相比存在性能回归。该基准测试旨在测试模型在简单推理错误方面的表现，结果显示 Opus 4.7 需要显著更高的配置设置才能获得比 Opus 4.6 更低的分数。创建者认为 Opus 4.7 的过度自信和减少的推理 token 使用量可能是导致这一明显性能倒退的原因。 AI

影响表明模型版本控制和性能一致性可能存在问题，促使进一步调查 Anthropic 的模型开发。

排序理由用户创建的基准测试揭示了特定模型版本的性能回归。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/ClaudeAI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/ClaudeAI TIER_2 English(EN) · /u/pawofdoom · 2026-06-27 20:16

I created a new benchmark and it interestingly showed the regression from Opus 4.6 -> 4.7

<div class="md"><p>I originally created <a href="https://obviousbench.com/">ObviousBench</a> to measure the performance of small and low reasoning model's exposures to making 'dumb' mistakes, like not being able to spell Google, or walking to the car wash etc.</p> …

报道来源 [1]

I created a new benchmark and it interestingly showed the regression from Opus 4.6 -> 4.7

相关实体

相关话题