GPT-5.1-Codex-Max Evaluation Results

By PulseAugur Editorial · Summary by None from 1 source

METR has evaluated OpenAI's GPT-5.1-Codex-Max, finding it to be a low-risk incremental improvement over previous models. The evaluation focused on threat models related to AI R&D automation and rogue replication, concluding that further development along current trends is unlikely to pose significant risks in these areas. However, the report acknowledges the possibility of unforeseen breakthroughs or substantial increases in compute power that could alter this risk assessment. AI

Summary written by None from 1 source. How we write summaries →

RANK_REASON This is a research paper evaluating an AI model's safety risks.

Read on METR (Model Evaluation & Threat Research) →

paper
safety
model release

COVERAGE [1]

METR (Model Evaluation & Threat Research) TIER_1 · 2025-11-19 08:00

GPT-5.1-Codex-Max Evaluation Results

COVERAGE [1]

GPT-5.1-Codex-Max Evaluation Results

RELATED TOPICS