PulseAugur
LIVE 12:27:36
research · [3 sources] ·
0
research

METR releases tools and protocols to evaluate dangerous AI autonomy

METR (Model Evaluation & Threat Research) has released a suite of resources designed to evaluate the dangerous autonomous capabilities of AI models. This includes a task suite with 31 example tasks and summaries for 186 others, along with software tooling and guidelines for accurate measurement. The goal is to provide a practical and cost-effective method for assessing risks from autonomous AI systems, enabling the development of appropriate safety precautions. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

RANK_REASON Release of an open-source task suite and protocol for evaluating AI safety, rather than a frontier model release or major policy change.

Read on METR (Model Evaluation & Threat Research) →

COVERAGE [3]

  1. METR (Model Evaluation & Threat Research) TIER_1 ·

    Autonomy Evaluation Resources

    <p>METR is sharing a collection of <a href="https://evaluations.metr.org/">resources</a> for evaluating potentially dangerous autonomous capabilities of frontier models.</p> <p>These resources include a task suite, some software tooling, guidelines on how to ensure an accurate me…

  2. METR (Model Evaluation & Threat Research) TIER_1 ·

    Example autonomy evaluation protocol

  3. METR (Model Evaluation & Threat Research) TIER_1 ·

    Example autonomy task suite