METR (Model Evaluation & Threat Research) has released a suite of resources designed to evaluate the dangerous autonomous capabilities of AI models. This includes a task suite with 31 example tasks and summaries for 186 others, along with software tooling and guidelines for accurate measurement. The goal is to provide a practical and cost-effective method for assessing risks from autonomous AI systems, enabling the development of appropriate safety precautions. AI
Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →
RANK_REASON Release of an open-source task suite and protocol for evaluating AI safety, rather than a frontier model release or major policy change.