GPT-5.4 over-edits code, costing 6.5x more than Claude Opus

By PulseAugur Editorial · [1 sources] · 2026-06-03 14:08

A new analysis reveals that GPT-5.4 exhibits a significant over-editing tendency, producing outputs that are functionally correct but structurally diverge from the original code far more than necessary. This behavior results in a "token tax," where models like GPT-5.4 use 6.5 times more output tokens for the same fix compared to models like Claude Opus 4.6. This inefficiency translates to substantial cost increases for organizations, with potential monthly overages of over $1,650 per 40,000 edits. The analysis suggests that this issue cannot be solved by simply using smaller models or increasing reasoning budgets, but rather by measuring and managing an "over-edit ratio" as a key performance indicator for AI agents. AI

IMPACT Highlights significant cost inefficiencies in current LLMs for code generation tasks, urging operators to implement new metrics for cost control.

RANK_REASON This is an analysis and commentary on existing model behavior and its cost implications, not a new model release or benchmark.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · John Medina · 2026-06-03 14:08

Over-editing is a token tax: GPT-5.4 ships 6.5x more diff per fix than Claude Opus 4.6, and your bill notices

<p>A model is over-editing if its output is functionally correct but structurally diverges from the original code more than the minimal fix requires. Left unconstrained, the extended reasoning gives models more room to 'improve' code that doesn't need improving.</p> <p>GPT-5.4 av…

COVERAGE [1]

Over-editing is a token tax: GPT-5.4 ships 6.5x more diff per fix than Claude Opus 4.6, and your bill notices

RELATED ENTITIES

RELATED TOPICS