Brief · PulseAugur

FRONTIER RELEASE · arXiv cs.AI English(EN) · 2w · [9 sources]

Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code

Anthropic has released Claude Opus 4.8, featuring enhanced effort controls, dynamic workflows, and improved honesty in coding tasks. This new model demonstrates significant gains on benchmarks like SWE-bench Pro and GraphWalks, while also offering a faster and cheaper mode. The release aims to address common failure modes in AI coding agents, such as constraint violations and overconfidence, by providing more robust configuration and alignment. AI

IMPACT Sets new SOTA on coding benchmarks and improves agent reliability, potentially accelerating adoption of advanced AI coding assistants.

Anthropic
GPT-5.5
Claude Opus 4.7
Gemini 3.1 Pro
Grok
GitHub Copilot
SWE-bench Pro
Amazon Bedrock
Microsoft Foundry
Google Cloud Vertex AI
Claude Opus 4.8
GraphWalks