Claude Opus 4.1 nears human expert performance across 44 white-collar jobs

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

A new evaluation metric called GDPVal suggests that Anthropic's Claude Opus 4.1 is performing at 95% of human expert levels across 44 white-collar professions. This metric aims to measure an AI's capability in tasks typically performed by highly skilled human workers. The findings indicate a significant advancement in AI's ability to handle complex professional duties. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON New evaluation metric and benchmark result for an existing model.

Read on Smol AINews →

COVERAGE [1]

Smol AINews TIER_1 · 2025-09-25 05:44

GDPVal finding: Claude Opus 4.1 within 95% of AGI (human experts in top 44 white collar jobs)

**OpenAI**'s Evals team released **GDPval**, a comprehensive evaluation benchmark covering 1,320 tasks across 44 predominantly digital occupations, assessing AI models against human experts with 14 years average experience. Early results show **Claude 4.1 Opus** outperforming hum…

COVERAGE [1]

GDPVal finding: Claude Opus 4.1 within 95% of AGI (human experts in top 44 white collar jobs)

RELATED TOPICS