A new analysis suggests that GPT-5.4 exhibits a significant "over-editing" problem, producing outputs that are functionally correct but structurally diverge from the original code much more than necessary. This over-editing results in a 6.5x higher token cost for fixes compared to Claude Opus 4.6, with similar pass@1 correctness. The issue is not resolved by using larger models, as reasoning models appear to worsen the problem with increased budget. The author proposes measuring and routing around this "over-edit ratio" as a critical cost-saving metric for AI agents. AI
IMPACT Highlights a potential cost inefficiency in LLM code editing, suggesting new metrics and routing strategies for cost optimization.
RANK_REASON The item analyzes the behavior of existing models and proposes a new metric, rather than announcing a new release or research finding.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →