PulseAugur
EN
LIVE 19:32:14

Anthropic's Claude model achieves 3-4 hour METR task horizon

Anthropic's Claude model has achieved a significant milestone, reaching a 3-4 hour task horizon for its METR 80% predictions. This performance matches the predictions made by top superforecasters in early May. The achievement was noted by Ethan Mollick on Bluesky. AI

IMPACT Demonstrates significant progress in AI's ability to handle complex, multi-hour tasks, potentially impacting future AI agent capabilities.

RANK_REASON The cluster reports on a specific benchmark achievement for an AI model, aligning with expert predictions. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Bluesky Jetstream — AI desk →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. Bluesky Jetstream — AI desk TIER_1 English(EN) · emollick.bsky.social ·

    In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizons would reach 3-4 hours.

    In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizons would reach 3-4 hours. In late May, Claude Mythos achieved that number.