Anthropic's Claude model has achieved a significant milestone, reaching a 3-4 hour task horizon for its METR 80% predictions. This performance matches the predictions made by top superforecasters in early May. The achievement was noted by Ethan Mollick on Bluesky. AI
IMPACT Demonstrates significant progress in AI's ability to handle complex, multi-hour tasks, potentially impacting future AI agent capabilities.
RANK_REASON The cluster reports on a specific benchmark achievement for an AI model, aligning with expert predictions. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Bluesky Jetstream — AI desk →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →