PulseAugur
实时 19:32:08
English(EN) In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizons would reach 3-4 hours.

Anthropic 的 Claude 模型实现了 3-4 小时的 METR 任务时限

AnthropicClaude 模型在 METR 80% 的预测中达到了 3-4 小时的任务时限这一重要里程碑。这一表现与五月初顶尖超级预测者的预测相符。Ethan MollickBluesky 上注意到了这一成就。 AI

影响 展示了 AI 处理复杂、多小时任务能力的重大进展,可能影响未来 AI 代理的能力。

排序理由 该集群报告了 AI 模型的一项特定基准成就,与专家预测一致。 [lever_c_demoted from research: ic=1 ai=1.0]

在 Bluesky Jetstream — AI desk 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. Bluesky Jetstream — AI desk TIER_1 English(EN) · emollick.bsky.social ·

    In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizons would reach 3-4 hours.

    In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizons would reach 3-4 hours. In late May, Claude Mythos achieved that number.