METR releases guidelines for eliciting AI model capabilities and risks

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

The Model Evaluation & Threat Research (METR) organization has published guidelines for assessing AI model capabilities, focusing on elicitation techniques. These guidelines aim to measure a model's potential performance after some level of post-training enhancement, rather than its raw state. The process involves initial basic elicitation, followed by analysis of remaining failure modes to determine if they can be easily fixed with further effort. METR emphasizes the importance of considering finetuning, prompting, and tooling in threat modeling, especially for open-source or potentially modifiable models. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a framework for evaluating AI model safety and potential risks through structured capability elicitation.

RANK_REASON Publication of guidelines for AI model capability elicitation by a research organization.

Read on METR (Model Evaluation & Threat Research) →

paper
safety

COVERAGE [1]

METR (Model Evaluation & Threat Research) TIER_1 Italiano(IT) · 2024-03-15 09:00

Guidelines for capability elicitation

<h2 id="1-overview">1. Overview</h2> <p>This is an example set of guidelines for eliciting models against a test suite, given a “dev suite” to iterate against. In principle, it is agnostic to the type of capabilities that are being tested for, but it was designed with general aut…

COVERAGE [1]

Guidelines for capability elicitation

RELATED ENTITIES

RELATED TOPICS