METR has evaluated OpenAI's GPT-5.1-Codex-Max, finding it to be a low-risk incremental improvement over previous models. The evaluation focused on threat models related to AI R&D automation and rogue replication, concluding that further development along current trends is unlikely to pose significant risks in these areas. However, the report acknowledges the possibility of unforeseen breakthroughs or substantial increases in compute power that could alter this risk assessment. AI
Summary written by None from 1 source. How we write summaries →
RANK_REASON This is a research paper evaluating an AI model's safety risks.