Anthropic has released an updated version of its Claude 3.5 Sonnet model, demonstrating significant improvements in coding and tool-use benchmarks. The model achieved a 49.0% success rate on the SWE-bench Verified coding task, surpassing other publicly available models. Additionally, it showed gains on the TAU-bench agentic tool use task across different domains. These advancements are offered at the same price and speed as the previous iteration, with new 'Computer Use' tools designed to reduce integration friction for AI agents. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON Release of an updated model with benchmark performance improvements and new features.