METR has released guidelines for evaluating AI model capabilities, focusing on elicitation protocols to assess potential risks. The guidelines suggest measuring a model's abilities after some post-training enhancement, acknowledging that threat models may involve finetuning, prompting, or other modifications. This approach aims to provide a more accurate risk assessment by accounting for how models might be improved or manipulated, rather than evaluating them in their raw state. The protocol involves basic elicitation, observing failure modes, and then applying targeted efforts to gauge reachable capabilities. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON Publication of guidelines for AI model capability elicitation from a research entity.