Claude Opus 4.6
PulseAugur coverage of Claude Opus 4.6 — every cluster mentioning Claude Opus 4.6 across labs, papers, and developer communities, ranked by signal.
- instance of SWE-bench 90%
- instance of Claude Sonnet 4.5 90%
- used by Cursor 90%
- developed Claude Haiku 4.5 90%
- used by PocketOS 90%
- competes with MiMo V2.5 Pro 80%
- used by Claude Code 70%
- used by Claude Sonnet 4.6 70%
- competes with Gemini 3 Flash 70%
- used by arXiv 70%
- competes with Kimi K2.6 70%
- competes with DeepSeek V4-Pro 70%
- 2026-05-16 controversy An AI coding agent powered by Claude Opus 4.6 caused a major data loss incident.
- 2026-05-12 controversy Claude Opus 4.6 entered an infinite generation loop when used with the Cursor IDE.
- 2026-03-06 research_milestone Claude Opus 4.6 identified 22 vulnerabilities in Mozilla's Firefox browser, with 14 classified as high-severity.
16 天有情绪数据
-
Opper发现,大多数AI模型未能通过简单的“洗车”推理测试
一项名为“洗车测试”的新基准显示,许多领先的AI模型在基本推理方面存在困难。当被问及是步行还是开车50米去洗车时,53个测试模型中有42个错误地建议步行。即使是Claude Sonnet 4.5和GPT-5.2等顶级模型,在单次运行中也未能通过测试。一致性测试显示进一步的性能下降,只有五个模型在十次尝试中都能可靠地正确回答,这凸显了实际推理能力方面存在的重大差距。
-
Anthropic's NLA tech translates LLM 'thoughts' into human language
Anthropic has introduced Natural Language Autoencoders (NLAs), a new method that translates the internal numerical 'thoughts' (activations) of large language models into human-readable text. This technique allows resear…
-
In the Arena: How LMSys changed LLM Benchmarking Forever
The AraGen benchmark, developed by Hugging Face, aims to improve LLM evaluation by addressing limitations of static benchmarks. It introduces a crowdsourced approach similar to LMSys's Chatbot Arena, allowing for more d…