METR releases guidelines for eliciting AI model capabilities and risks

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

METR has released guidelines for evaluating AI model capabilities, focusing on elicitation protocols to assess potential risks. The guidelines suggest measuring a model's abilities after some post-training enhancement, acknowledging that threat models may involve finetuning, prompting, or other modifications. This approach aims to provide a more accurate risk assessment by accounting for how models might be improved or manipulated, rather than evaluating them in their raw state. The protocol involves basic elicitation, observing failure modes, and then applying targeted efforts to gauge reachable capabilities. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON Publication of guidelines for AI model capability elicitation from a research entity.

Read on METR (Model Evaluation & Threat Research) →

paper
safety

COVERAGE [1]

METR (Model Evaluation & Threat Research) TIER_1 · 2024-03-15 09:00

Guidelines for capability elicitation

COVERAGE [1]

Guidelines for capability elicitation

RELATED TOPICS