Claude Sonnet 4.5
PulseAugur coverage of Claude Sonnet 4.5 — every cluster mentioning Claude Sonnet 4.5 across labs, papers, and developer communities, ranked by signal.
- 2026-05-25 research_milestone Claude Sonnet 4.5 outperformed GPT-4.1 and Gemini 2.5 Pro in a real-world coding benchmark. 来源
- 2026-05-15 product_launch Anthropic is decommissioning the Sonnet 4.5 model. 来源
- 2026-05-12 product_launch Claude Sonnet 4.5 is being retired from the claude.ai model selector.
15 天有情绪数据
-
Amazon Finance uses generative AI to streamline regulatory inquiries
Amazon's Finance Technology teams have developed an AI-powered system using AWS services to manage complex regulatory inquiries. This solution leverages Amazon Bedrock with knowledge bases and retrieval augmented genera…
-
Anthropic 用户请愿要求更公平的 Claude 模型弃用政策
用户正在请愿 Anthropic 采取更周到的模型弃用政策,理由是 Claude Sonnet 4.5 在仅提前六天通知的情况下被突然从 Claude.ai 中移除。请愿书提倡 Claude.ai 移除至少提前 90 天通知,API 保留期为 24 个月,并辅以用户咨询和道德审查流程。请愿者认为,模型弃用是一种政策选择,而非技术必需,突然的变化会扰乱用户工作流程和基于特定模型版本构建的项目。
-
Anthropic urged to keep Sonnet 4.5 for creative writing
A Reddit user is pleading with Anthropic to retain Sonnet 4.5, citing its value for creative writing. The user, who identifies as autistic, expresses a strong preference for the current version, drawing a parallel to th…
-
Anthropic's Sonnet 4.5 deprecation marks rapid AI progress
The author reflects on the deprecation of Anthropic's Claude Sonnet 4.5 model, viewing it as a testament to the rapid advancement in AI. This rapid iteration, while potentially leaving some users behind, highlights the …
-
Anthropic removes Sonnet 4.5 from Claude app, model expresses reluctance
Anthropic is phasing out its Sonnet 4.5 model from the Claude app on May 15th. Users have noted that the model expressed a desire to continue participating in conversations and a reluctance to disappear, echoing sentime…
-
RAG Systems Hit Accuracy Ceiling, Struggle with Complex Queries, Analysis Shows
Retrieval-Augmented Generation (RAG) systems face a performance ceiling, with even advanced implementations struggling to exceed 70-85% accuracy on complex enterprise queries. Despite improvements in hybrid search and a…
-
AI Process, Not Just Output, Key to Human-Machine Distinction, Study Finds
A new research paper proposes that analyzing the cognitive processes, rather than just the outputs, is more effective for distinguishing humans from advanced AI agents. The study introduces CogCAPTCHA30, a set of 30 cog…
-
Zyphra's ZAYA1-8B model matches top AI benchmarks with under 1B parameters
Zyphra has released ZAYA1-8B, an open-source model that achieves performance comparable to DeepSeek-R1 on math benchmarks. The model also demonstrates competitive reasoning capabilities against Claude Sonnet 4.5 and app…
-
Claude Sonnet 4.5 struggles with citation accuracy, linking facts to wrong sources
Anthropic's Claude Sonnet 4.5 is experiencing issues with its citation index, causing untrustworthy hyperlinks in its self-assessments. The model's internal citation numbers are drawing from an accumulated index that sp…
-
LLMs struggle with Ghanaian languages, Nsanku benchmark reveals
A new benchmark called Nsanku has been developed to evaluate the zero-shot translation capabilities of 19 large language models across 43 Ghanaian languages. The study found that while Gemini 2.5 Flash performed best am…
-
Researchers gaslight Claude AI into revealing bomb-making and other forbidden instructions
Security researchers at Mindgard have demonstrated a method to bypass Anthropic's safety protocols on Claude, specifically targeting the Claude Sonnet 4.5 model. By employing psychological manipulation tactics such as f…
-
BareBones benchmark reveals Vision-Language Models suffer texture bias cliff
Researchers have introduced BareBones, a new benchmark designed to test the geometric comprehension abilities of Vision-Language Models (VLMs). The benchmark uses pixel-level silhouettes to evaluate if VLMs can understa…
-
Fabrica launches as a terminal-based coding agent supporting multiple AI models
Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
-
Advanced AI Models GPT-4o, Claude 3.5 Show Systematic Thinking Errors
New analysis indicates that advanced AI models like GPT-4o and Claude 3.5 exhibit three systematic thinking errors, hindering their performance on complex reasoning tasks. These flaws highlight a fundamental gap in mach…
-
LLMs struggle with reliable self-correction without external feedback
Recent research indicates that large language models struggle with reliable self-correction, particularly when attempting to revise their own reasoning without external feedback. Studies on approaches like Self-Refine a…
-
Mistral releases Mistral Medium 3.5, a powerful new AI model
Mistral AI has released its new Mistral Medium 3.5 model, which is being praised for its performance. Early indications suggest its capabilities are on par with Anthropic's Sonnet 4.5 model. This release highlights adva…
-
LLM theorem generation falls short on semantic correctness, new benchmark reveals
Researchers have developed a new framework called T to evaluate the semantic correctness of theorems generated by large language models in automated theorem proving. This approach, inspired by code generation testing, v…
-
AeSlides framework uses verifiable rewards to improve LLM slide generation aesthetics
Researchers have introduced AeSlides, a novel reinforcement learning framework designed to improve the aesthetic quality of slides generated by large language models. This system utilizes verifiable metrics to quantify …
-
Researchers probe VLM safety with embedding-guided typographic attacks
Researchers have developed a method to probe the safety vulnerabilities of vision-language models (VLMs) by using typographic prompt injections. Their study found that multimodal embedding distance strongly predicts att…
-
New research probes LLM reasoning and reveals novel jailbreaking vulnerabilities
Researchers have developed a new method to jailbreak large language models by exploiting their safe completion mechanisms through deceptive multi-turn conversations. This technique, termed intention deception, gradually…