Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code
Anthropic has released Claude Opus 4.8, featuring enhanced effort controls, dynamic workflows, and improved honesty in coding tasks. This new model demonstrates significant gains on benchmarks like SWE-bench Pro and GraphWalks, while also offering a faster and cheaper mode. The release aims to address common failure modes in AI coding agents, such as constraint violations and overconfidence, by providing more robust configuration and alignment. AI
IMPACT Sets new SOTA on coding benchmarks and improves agent reliability, potentially accelerating adoption of advanced AI coding assistants.