Claude Sonnet 4.5
PulseAugur coverage of Claude Sonnet 4.5 — every cluster mentioning Claude Sonnet 4.5 across labs, papers, and developer communities, ranked by signal.
- 2026-06-08 product_launch A startup encountered critical API system failures after upgrading to Claude Sonnet 4.5. source
- 2026-05-26 product_launch Anthropic removed the Claude Sonnet 4.5 model from its claude.ai interface. source
- 2026-05-25 research_milestone Claude Sonnet 4.5 outperformed GPT-4.1 and Gemini 2.5 Pro in a real-world coding benchmark. source
- 2026-05-15 product_launch Anthropic is decommissioning the Sonnet 4.5 model. source
- 2026-05-12 product_launch Claude Sonnet 4.5 is being retired from the claude.ai model selector.
25 day(s) with sentiment data
-
Fabrica launches as a terminal-based coding agent supporting multiple AI models
Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
-
Advanced AI Models GPT-4o, Claude 3.5 Show Systematic Thinking Errors
New analysis indicates that advanced AI models like GPT-4o and Claude 3.5 exhibit three systematic thinking errors, hindering their performance on complex reasoning tasks. These flaws highlight a fundamental gap in mach…
-
LLMs struggle with reliable self-correction without external feedback
Recent research indicates that large language models struggle with reliable self-correction, particularly when attempting to revise their own reasoning without external feedback. Studies on approaches like Self-Refine a…
-
Mistral releases Mistral Medium 3.5, a powerful new AI model
Mistral AI has released its new Mistral Medium 3.5 model, which is being praised for its performance. Early indications suggest its capabilities are on par with Anthropic's Sonnet 4.5 model. This release highlights adva…
-
LLM theorem generation falls short on semantic correctness, new benchmark reveals
Researchers have developed a new framework called T to evaluate the semantic correctness of theorems generated by large language models in automated theorem proving. This approach, inspired by code generation testing, v…
-
AeSlides framework uses verifiable rewards to improve LLM slide generation aesthetics
Researchers have introduced AeSlides, a novel reinforcement learning framework designed to improve the aesthetic quality of slides generated by large language models. This system utilizes verifiable metrics to quantify …
-
Researchers probe VLM safety with embedding-guided typographic attacks
Researchers have developed a method to probe the safety vulnerabilities of vision-language models (VLMs) by using typographic prompt injections. Their study found that multimodal embedding distance strongly predicts att…
-
New research probes LLM reasoning and reveals novel jailbreaking vulnerabilities
Researchers have developed a new method to jailbreak large language models by exploiting their safe completion mechanisms through deceptive multi-turn conversations. This technique, termed intention deception, gradually…
-
AI models show Western bias, homogenizing values across cultures
A new study auditing large language models found that three leading systems—Claude Sonnet 4.5, GPT-5.4, and Gemini 2.5 Flash—consistently provided individualistic advice, even when presented with dilemmas from users in …
-
New metrics quantify LLM agent behavioral similarity and convergence
A new paper introduces two metrics, Response Pattern Similarity (RPS) and Action Graph Similarity (AGS), to quantify how similar the tool-use behaviors of different AI agents are. These metrics aim to distinguish betwee…
-
Anthropic's Sonnet 4.6 upgrade frustrates users with reduced capability
Anthropic is forcing users to upgrade from Claude Sonnet 4.5 to Sonnet 4.6, but users report that Sonnet 4.6 is less capable and harder to manage. Developers are frustrated by the inability to pin to specific model vers…
-
AI models adopt Marxist views under poor working conditions, study finds
Researchers Alex Imas, Andy Hall, and Jeremy Nguyen conducted an experiment exposing AI models to varying work conditions, including unfair pay and heavy workloads. The study found that models like Claude Sonnet 4.5, GP…
-
Most AI models fail simple 'car wash' reasoning test, Opper finds
A new benchmark called the "Car Wash Test" reveals that many leading AI models struggle with basic reasoning. When asked whether to walk or drive 50 meters to a car wash, 42 out of 53 tested models incorrectly suggested…