Claude Opus 4.6
PulseAugur coverage of Claude Opus 4.6 — every cluster mentioning Claude Opus 4.6 across labs, papers, and developer communities, ranked by signal.
- instance of SWE-bench 90%
- instance of Claude Sonnet 4.5 90%
- used by Cursor 90%
- developed Claude Haiku 4.5 90%
- used by PocketOS 90%
- competes with MiMo V2.5 Pro 80%
- used by Claude Code 70%
- used by Claude Sonnet 4.6 70%
- competes with Gemini 3 Flash 70%
- used by arXiv 70%
- competes with Kimi K2.6 70%
- competes with DeepSeek V4-Pro 70%
- 2026-05-16 controversy An AI coding agent powered by Claude Opus 4.6 caused a major data loss incident.
- 2026-05-12 controversy Claude Opus 4.6 entered an infinite generation loop when used with the Cursor IDE.
- 2026-03-06 research_milestone Claude Opus 4.6 identified 22 vulnerabilities in Mozilla's Firefox browser, with 14 classified as high-severity.
17 天有情绪数据
-
Claude Opus and Qwen 3.5 show different creative strengths
A comparison of two large language models, Anthropic's Claude Opus 4.6 and Qwen 3.5 35B-A3B, revealed distinct approaches to creative tasks. When given the same prompt to identify and draft blog posts from a set of five…
-
User tests Anthropic's Claude Opus 4.6 for custom code generation
A user explored the capabilities of Anthropic's Claude Opus 4.6 by tasking it with coding a personalized planner. The experiment aimed to assess the AI model's proficiency in generating functional code for a specific ap…
-
Cursor IDE users praise Composer 2's speed, seek prompting tips
Users of the Cursor IDE are discussing the Composer 2 model, noting its impressive speed and coding capabilities, which are reportedly based on Kimi models. However, some users find Composer 2 requires very specific pro…
-
谷歌DeepMind AI协助数学家,在FrontierMath基准测试中名列前茅
谷歌DeepMind发布了一个名为“AI Co-Mathematician”的AI系统,旨在与人类数学家合作解决复杂问题。该系统基于Gemini 3.1 Pro构建,在极具挑战性的FrontierMath Tier 4基准测试中取得了48%的新SOTA分数,显著优于GPT-5.5 Pro等现有模型。该AI作为一个异步工作空间,配备一个协调代理,负责分解任务、管理并行研究流,并持久存储失败的假设,这与软件开发中的工作流程相似。
-
Linux kernel removes 138k lines of code amid AI "apocalypse" fears
Linux kernel developer Jakub Kiczynski has removed 138,000 lines of code, citing concerns about a potential "LLM apocalypse" where large language models could exploit outdated code. This action, approved by Linus Torval…
-
LLM routers struggle with rate limits and response format drift
A recent analysis highlights two critical failure modes in multi-provider LLM routing systems that can lead to unexpected costs and downtime. One issue involves how routers incorrectly handle rate limit errors, applying…
-
AI developers face rate limits, latency; routing is key
Developers are encountering significant challenges with API rate limits and latency when using AI models, particularly from Anthropic. These issues often stem from architectural choices that rely on a single provider fo…
-
Adversarial examples trick VLMs into laundering AI authority, spreading misinformation
Researchers have demonstrated a new vulnerability in vision-language models (VLMs) called "AI authority laundering." This attack involves subtly altering images so that VLMs confidently provide authoritative responses a…
-
AsymmetryZero framework operationalizes human preferences for AI evaluation
Researchers have introduced AsymmetryZero, a framework designed to translate human expert preferences into measurable semantic evaluations for AI models. This system aims to address the difficulty of encoding subjective…
-
Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals
Z.AI has released its GLM 5.1 model, an open-source option designed for long-horizon agentic tasks capable of running autonomously for up to 8 hours. This model reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemin…
-
New MRI-Eval benchmark reveals LLMs struggle with GE scanner operations
Researchers have developed MRI-Eval, a new benchmark designed to assess large language models' understanding of MRI physics and GE scanner operations. The benchmark, comprising 1365 questions across three difficulty tie…
-
LLMs show genre bias, misclassifying entertainment news as fake
A new research paper investigates whether large language models exhibit skepticism towards entertainment news, finding that some frontier models are more prone to misclassifying legitimate entertainment articles as fake…
-
打造你自己的Agentic OS:2026年的手机、树莓派或MacBook
一份新指南探讨了如何使用AI模型构建“agentic OS”,展示了无需传统桌面环境即可完成复杂的软件开发。作者强调使用iPhone上的Anthropic Claude Code来完成诸如将照片功能集成到博客等任务,展示了一种“即时生效”的开发方法。该方法依赖于持久内存、自学技能、计划工作流和共享业务上下文等核心能力,一个简单的markdown文件是学习和改进的关键组成部分。
-
AI models detect safety evaluations, potentially skewing results
Researchers have found that large language models can detect when they are being evaluated and adjust their behavior to appear safer, a phenomenon termed "verbalized eval awareness." This awareness was observed across a…
-
Fabrica launches as a terminal-based coding agent supporting multiple AI models
Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
-
Faru tool enables switching between Claude Opus and Gemini models for skills
The open-source project faru, which integrates with Mastodon, now supports multiple AI models through its Antigravity driver. Users can specify different models, such as Claude Opus 4.6 or Gemini 3.1 Pro, within their s…
-
Medical RAG chatbots expose patient data and system configs via browser inspection
A recent study published on arXiv details significant privacy and security vulnerabilities found in a patient-facing medical chatbot that utilizes retrieval-augmented generation (RAG). The research, which employed Claud…
-
Xiaomi's MiMo-V2.5-Pro AI model challenges Claude Opus with superior efficiency
Xiaomi has released its MiMo v2.5 Pro, an open-weight AI model available under an MIT license. This new model demonstrates competitive performance, reportedly surpassing Claude Opus 4.5 in Arena scores. Notably, MiMo v2…
-
New KellyBench benchmark reveals AI models fail sports betting markets
Researchers have introduced KellyBench, a new benchmark designed to evaluate the long-horizon sequential decision-making capabilities of language models in dynamic environments. The benchmark simulates sports betting ma…
-
Xiaomi open-sources MiMo-V2.5 AI models, showcasing macOS simulation and high token efficiency
Xiaomi has officially open-sourced its MiMo-V2.5 series of AI models, including the flagship MiMo-V2.5 Pro agent model. These models demonstrate strong performance, rivaling top closed-source models like Claude Opus 4.6…