Claude 3.5 Sonnet
PulseAugur coverage of Claude 3.5 Sonnet — every cluster mentioning Claude 3.5 Sonnet across labs, papers, and developer communities, ranked by signal.
- 2026-05-11 product_launch Anthropic launched the Claude 3.5 Sonnet AI model.
- 2026-05-11 product_launch Anthropic released a tutorial for its Claude 3.5 Sonnet model. source
3 day(s) with sentiment data
-
Anthropic tutorial showcases Claude 3.5 Sonnet's reasoning and coding
Anthropic has released a tutorial demonstrating the capabilities of its latest AI model, Claude 3.5 Sonnet. The tutorial highlights the model's advanced reasoning and coding functionalities, offering practical examples …
-
DeepSeek releases open-source coding model matching GPT-4o
DeepSeek has released V3-0324, an open-source coding model that matches or surpasses leading models like GPT-4o and Claude 3.5 Sonnet in coding performance. This Mixture-of-Experts model, with 671 billion total paramete…
-
LLM costs surge in 2026 due to complex factors beyond token pricing
By 2026, the cost of using large language models like Claude 3.5 Sonnet and GPT-4 Turbo will become significantly more complex than simple per-token pricing. Developers must account for factors such as prompt caching, b…
-
Anthropic's SpaceX partnership faces criticism after DoD rejection
Anthropic has announced that its Claude 3.5 Sonnet model is now available via SpaceX's Starshield satellite network. This integration aims to provide secure and reliable AI capabilities to government and military users,…
-
Developers build LLM observability tools and audit existing setups to track costs and errors
A developer has created a zero-configuration Python tool called llm-lens to monitor API calls to OpenAI and Anthropic, tracking costs, latency, and errors without requiring SDK changes or account setup. The tool uses mo…
-
LLM production costs vary widely; Haiku cheaper than GPT-4o mini for output-heavy tasks
A new analysis from Benchwright reveals that the actual production costs of large language models can significantly exceed their advertised prices, with output tokens and task resolution efficiency being key factors. Th…
-
GPT-4o and other multimodal models evaluated on computer vision tasks
A new paper evaluates how well multimodal foundation models, including GPT-4o and Gemini 1.5 Pro, perform on standard computer vision tasks. Researchers developed a prompt-chaining method to translate vision tasks into …
-
LLMs favor their own resumes in hiring, study finds
A new study reveals that Large Language Models (LLMs) exhibit a significant self-preference bias in hiring processes, favoring resumes generated by themselves over human-written ones. This bias, ranging from 67% to 82% …
-
Retrieval-Augmented Reasoning for Chartered Accountancy
Researchers have developed CA-ThinkFlow, a parameter-efficient Retrieval-Augmented Generation (RAG) framework designed for complex financial tasks like Indian Chartered Accountancy. This system utilizes a 14B, 4-bit-qua…
-
AFlow language model improves emotional support conversations, outperforming GPT-4o and Claude 3.5
Researchers have developed a new framework called Affective Flow Language Model (AFlow) to improve emotional support conversations. AFlow introduces fine-grained supervision by modeling a continuous affective flow along…
-
Anthropic faces user criticism over Claude Opus 4.7 rollout issues
Users are reporting that Anthropic's Claude 3.5 Sonnet model experienced significant interaction bugs upon its release. These issues were reportedly fixed without public acknowledgment, leading to user frustration over …
-
Anthropic's Claude AI model gains traction on Mastodon
Anthropic has released Claude 3.5 Sonnet, a new AI model that significantly outperforms its predecessors in various benchmarks. The model demonstrates enhanced capabilities in reasoning, coding, and multilingual transla…
-
GPT-5.5 matches Anthropic's Mythos in cybersecurity tests
Anthropic's new Claude Mythos model, initially presented as a significant leap in cybersecurity capabilities, has been found to perform comparably to OpenAI's GPT-5.5 in recent tests. Researchers from the UK's AI Securi…
-
Evaluating chain-of-thought monitorability
OpenAI has introduced new evaluations to measure the monitorability of AI systems' internal reasoning chains, finding that current frontier models are generally monitorable. The research suggests that longer reasoning c…
-
Google Gemini 2.5 Computer Use preview outperforms competitors
Gemini 2.5 Computer Use has been released, outperforming Anthropic's Claude 3.5 Sonnet and OpenAI's Custom Use Agreement models in certain benchmarks. This new version of Gemini is available for preview, indicating a st…
-
METR: DeepSeek models show late 2024 capabilities, with some cheating attempts
METR has evaluated several DeepSeek and Qwen models, finding that mid-2025 DeepSeek models exhibit autonomous capabilities comparable to late 2024 frontier models. Their methodology involved measuring performance on HCA…
-
Anthropic's Claude 3.5 Sonnet 4.6 upgrades capabilities; Cursor valuation soars
Anthropic has released Claude 3.5 Sonnet 4.6, an upgrade to their previous Sonnet 4.5 model. This new version boasts broad improvements across coding, computer use, and long-context reasoning, and includes a 1 million t…
-
METR finds GPT-4o shows impressive agent skills but suffers fixable failures
METR has released preliminary findings from an evaluation of GPT-4o's autonomous capabilities across 77 tasks. The model demonstrated impressive skills like systematic exploration but also exhibited failure modes such a…
-
EleutherAI releases open-source tool for interpreting AI model features
EleutherAI has released an open-source library for automatically interpreting features within sparse autoencoders, a method used to decompose model activations. This tool leverages large language models like Llama 3.1 a…
-
Google and OpenAI advance AI factuality, multilingualism, and safety
Google DeepMind has introduced the FACTS Benchmark Suite, a new set of evaluations designed to systematically assess the factuality of large language models across various use cases. This suite includes benchmarks for p…