A new evaluation metric called GDPVal suggests that Anthropic's Claude Opus 4.1 is performing at 95% of human expert levels across 44 white-collar professions. This metric aims to measure an AI's capability in tasks typically performed by highly skilled human workers. The findings indicate a significant advancement in AI's ability to handle complex professional duties. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON New evaluation metric and benchmark result for an existing model.