English(EN) Fable improved our hardest agent benchmark by 23.7% in one day, this feels like a tipping point in recursive intelligence

Anthropic 的 Fable 模型将代理基准性能提高了 23.7%

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-12 17:44

一位用户报告称，Anthropic 的 Fable 模型在一天的内显著提高了他们内部的代理基准测试性能 23.7%。用户描述 Fable 在理解细微差别和识别错误根本原因方面能力极强，从而在代理性能方面实现了更具普遍性的改进。这一进展被强调为递归智能的潜在转折点，使模型能够通过跟踪-分析-修补-评估循环自主地改进自身。 AI

影响展示了 AI 代理快速自我改进的潜力，加速了递归智能的发展。

排序理由用户报告了特定模型的基准改进。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/ClaudeAI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/ClaudeAI TIER_2 English(EN) · /u/Lucky_Historian742 · 2026-06-12 17:44

Fable improved our hardest agent benchmark by 23.7% in one day, this feels like a tipping point in recursive intelligence

<div class="md"><p>I've experimented with Claude Code for autoresearch and harness optimisation style loops for improving agents for a while now. The workflow looks like this: collect traces, analyse traces to find improvements, patch the agent, make evals, repeat.…

报道来源 [1]

Fable improved our hardest agent benchmark by 23.7% in one day, this feels like a tipping point in recursive intelligence

相关实体

相关话题