English(EN) Opus 4.8 barely moved the leaderboard. It moved the one number that decides if your agents can be trusted.

Anthropic 的 Claude 4.8 优先考虑代理安全和更快、更便宜的模式

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-31 06:10

Anthropic 发布了 Claude 4.8，这是一个温和的更新，它优先考虑安全性和效率，而不是原始基准测试的提升。新模型忽略自身编码错误的几率降低了四倍，这对于自主代理应用程序来说是一项关键改进。此外，新的“快速模式”提供了显著降低的延迟和成本，使其成为高迭代任务的更可行选择。 AI

影响通过减少静默故障来增强代理的可靠性，使自主人工智能系统在复杂任务中更值得信赖。

排序理由来自前沿实验室 (Anthropic) 的模型发布，具有特定版本号。[lever_c_demoted from frontier_release: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Mirza Iqbal · 2026-05-31 06:10

Opus 4.8 几乎没有撼动排行榜。它撼动了一个决定你的代理是否值得信赖的数字。

Opus 4.8 shipped on 28 May 2026, 41 days after 4.7. Standard pricing did not move. Five dollars per million tokens in, twenty five out. SWE-bench Verified nudged from 87.6 to 88.6. SWE-bench Pro climbed from 64.3 to 69.2, about five points. On GDPval-AA it posted…