METR has introduced a new standard for defining and evaluating AI agent capabilities, aiming to improve task portability and reusability across different organizations. This standard, already in use for over 1,000 tasks covering areas like AI R&D and cybersecurity, facilitates easier sharing and validation of evaluation tasks. The UK AI Safety Institute is among the entities adopting this standard, which specifies task instructions, environment setup, and scoring mechanisms. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON Publication of a new standard for AI evaluation tasks by METR.