METR has evaluated OpenAI's GPT-5.1-Codex-Max, finding it to be a low-risk incremental improvement over previous models. The evaluation focused on AI R&D automation and rogue replication risks, concluding that current trends suggest these threats are unlikely to materialize significantly in the next six months. However, METR acknowledges the possibility of unforeseen breakthroughs or increased compute scale impacting these projections. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Suggests current AI development trends pose low risk for AI R&D automation and rogue replication in the near term.
RANK_REASON The report is an evaluation of a specific model's safety implications, not a release of a new model or a major policy shift.