English(EN) ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"

Fable 5 基准测试显示性能是 Opus 4.8 的两倍

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 13:54

一项新的基准测试 ProgramBench 已用于评估 Fable 5，结果表明其性能显著优于 Opus 4.8。基准测试的创建者指出，即使 Fable 5 在某些任务中使用了回退机制至 Opus 4.8，其性能仍是 Opus 4.8 的两倍。一个有趣的观察是，Fable 5 中回退到 Opus 4.8 所消耗的 token 是 Opus 4.8 单独执行类似任务的两倍。 AI

影响 Fable 5 在 ProgramBench 上性能达到 Opus 4.8 的两倍，表明其能力有了显著飞跃，可能给竞争对手带来压力。

排序理由该集群报告了特定 AI 模型的基准测试结果，属于研究范畴。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/singularity 阅读 →

模型发布

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/singularity TIER_2 English(EN) · /u/reefine · 2026-06-16 13:54

ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"

<div class="md"><p><a href="https://x.com/ValsAI/status/2066760552156971291">https://x.com/ValsAI/status/2066760552156971291</a></p> <p>Quite interesting result, ProgramBench creator seem to imply that there is a difference between Fable 5 falling back to 4.8 quick…

报道来源 [1]

ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"

相关实体

相关话题