PulseAugur
EN
LIVE 22:54:11

GPT-5.4 over-edits code, costing 6.5x more than Claude Opus 4.6

A new analysis suggests that GPT-5.4 exhibits a significant "over-editing" problem, producing outputs that are functionally correct but structurally diverge from the original code much more than necessary. This over-editing results in a 6.5x higher token cost for fixes compared to Claude Opus 4.6, with similar pass@1 correctness. The issue is not resolved by using larger models, as reasoning models appear to worsen the problem with increased budget. The author proposes measuring and routing around this "over-edit ratio" as a critical cost-saving metric for AI agents. AI

IMPACT Highlights a potential cost inefficiency in LLM code editing, suggesting new metrics and routing strategies for cost optimization.

RANK_REASON The item analyzes the behavior of existing models and proposes a new metric, rather than announcing a new release or research finding.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · John Medina ·

    Over-editing is a token tax: GPT-5.4 ships 6.5x more diff per fix than Claude Opus 4.6, and your bill notices

    <p>A model is over-editing if its output is functionally correct but structurally diverges from the original code more than the minimal fix requires. Left unconstrained, the extended reasoning gives models more room to 'improve' code that doesn't need improving.</p> <p>GPT-5.4 av…