English(EN) I Tested Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro on 20 Tasks — Opus Embarrassed Both on Long Context

Anthropic 的 Claude Opus 4.8 优于 GPT-5.5 和 Gemini 3.1 Pro

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-29 05:57

Anthropic 发布了 Claude Opus 4.8，据报道，该新模型在各种任务上，尤其是在涉及长上下文窗口的任务上，表现优于 GPT-5.5 和 Gemini 3.1 Pro。该模型在 SWE-bench 基准测试中取得了显著分数，表明其在代码生成和理解方面表现强劲。 AI

影响为长上下文性能设定了新基准，可能影响未来的模型开发和应用设计。

排序理由前沿实验室发布新模型，附带性能基准测试。[lever_c_demoted from frontier_release: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

Towards AI TIER_1 English(EN) · Chew Loong Nian - AI ENGINEER · 2026-05-29 05:57

I Tested Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro on 20 Tasks — Opus Embarrassed Both on Long Context

<div class="medium-feed-item"><p class="medium-feed-snippet">Anthropic dropped Claude Opus 4.8 on May 28, 2026. The headline number that made me stop scrolling wasn’t the SWE-bench score. It was this…</p><p class="medium-feed-link"><a href="https://pub.towardsai.net…