METR finds GPT-5.1-Codex-Max poses low risk for AI R&D automation

By PulseAugur Editorial · [1 sources] · 2025-11-19 08:00

METR has evaluated OpenAI's GPT-5.1-Codex-Max, finding it to be a low-risk incremental improvement over previous models. The evaluation focused on AI R&D automation and rogue replication risks, concluding that current trends suggest these threats are unlikely to materialize significantly in the next six months. However, METR acknowledges the possibility of unforeseen breakthroughs or increased compute scale impacting these projections. AI

IMPACT Suggests current AI development trends pose low risk for AI R&D automation and rogue replication in the near term.

RANK_REASON The report is an evaluation of a specific model's safety implications, not a release of a new model or a major policy shift.

Read on METR (Model Evaluation & Threat Research) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

METR finds GPT-5.1-Codex-Max poses low risk for AI R&D automation

COVERAGE [1]

METR (Model Evaluation & Threat Research) TIER_1 (CA) · 2025-11-19 08:00

GPT-5.1-Codex-Max Evaluation Results

<style> .caption { text-align: center; color: #555; font-size: 0.9em; font-style: italic; margin-top: -0.5em; margin-bottom: 1.5em; } </style> <p><strong>Note on independence:</strong> This evaluation was conducted under a standard NDA. Due to the se…

COVERAGE [1]

GPT-5.1-Codex-Max Evaluation Results

RELATED ENTITIES

RELATED TOPICS