The Model Evaluation & Threat Research (METR) organization has published guidelines for assessing AI model capabilities, focusing on elicitation techniques. These guidelines aim to measure a model's potential performance after some level of post-training enhancement, rather than its raw state. The process involves initial basic elicitation, followed by analysis of remaining failure modes to determine if they can be easily fixed with further effort. METR emphasizes the importance of considering finetuning, prompting, and tooling in threat modeling, especially for open-source or potentially modifiable models. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a framework for evaluating AI model safety and potential risks through structured capability elicitation.
RANK_REASON Publication of guidelines for AI model capability elicitation by a research organization.