Anthropic has released an updated version of its Claude 3.5 Sonnet model, demonstrating significant improvements in coding and tool-use benchmarks. The model achieved a 49.0% success rate on the SWE-bench Verified coding task, surpassing other publicly available models. Additionally, it showed gains on the TAU-bench agentic tool use task across different domains. These advancements are offered at the same price and speed as the previous iteration, with new 'Computer Use' tools designed to reduce integration friction for AI agents. AI
RANK_REASON Release of an updated model with benchmark performance improvements and new features.
Read on Latent Space Podcast →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →