PulseAugur
LIVE 12:29:14
research · [1 source] ·
0
research

Claude Opus 4.1 nears human expert performance across 44 white-collar jobs

A new evaluation metric called GDPVal suggests that Anthropic's Claude Opus 4.1 is performing at 95% of human expert levels across 44 white-collar professions. This metric aims to measure an AI's capability in tasks typically performed by highly skilled human workers. The findings indicate a significant advancement in AI's ability to handle complex professional duties. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON New evaluation metric and benchmark result for an existing model.

Read on Smol AINews →

COVERAGE [1]

  1. Smol AINews TIER_1 ·

    GDPVal finding: Claude Opus 4.1 within 95% of AGI (human experts in top 44 white collar jobs)

    **OpenAI**'s Evals team released **GDPval**, a comprehensive evaluation benchmark covering 1,320 tasks across 44 predominantly digital occupations, assessing AI models against human experts with 14 years average experience. Early results show **Claude 4.1 Opus** outperforming hum…