English(EN) 4.6 vs 4.8 with codex as judge

Claude 4.8 在代码库任务中优于 4.6，但更冗长

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-05 08:17

一位用户对 Claude Opus 4.6 和 4.8 进行了非科学性比较，并使用 Codex 5.5 作为裁判。结果表明，尽管 Claude 4.8 速度较慢且更冗长，但在理解代码库和检测风险方面总体表现更好。作为裁判的 Codex 5.5 也反映出，虽然 Claude 4.8 是一个更彻底的调查者，但它自己的输出会更简洁高效。 AI

影响表明模型在理解和风险检测方面有渐进式改进，但突出了冗长和效率方面的权衡。

排序理由用户进行的基准测试，比较了模型的两个版本。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/ClaudeAI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/ClaudeAI TIER_2 English(EN) · /u/rickythefox · 2026-06-05 08:17

4.6 vs 4.8 with codex as judge

<div class="md"><p>A very non-scientific test - I asked codex 5.5 xhigh to give claude the task of adding a feature to a medium-sized legacy codebase using 4.6 and 4.8 on max. </p> <p>The verdict confirms what I think we already know - 4.8 is better overall but is …

报道来源 [1]

4.6 vs 4.8 with codex as judge

相关实体

相关话题