Claude 4.6 Opus
PulseAugur coverage of Claude 4.6 Opus — every cluster mentioning Claude 4.6 Opus across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
New STT-Arena benchmark reveals LLMs struggle with dynamic environments
Researchers have introduced STT-Arena, a new benchmark designed to evaluate large language models' ability to adapt and replan in dynamic environments with spatio-temporal changes. The benchmark consists of 227 interact…
-
Microsoft Research: LLMs corrupt 25% of documents in delegated tasks
A new benchmark, DELEGATE-52, developed by Microsoft Research, reveals that current large language models significantly corrupt documents during delegated workflows. Even advanced models like Gemini 3.1 Pro, Claude 4.6 …
-
DeepSeek-V4 launches with 1M context, Chinese hardware optimization
DeepSeek has officially released its latest flagship model, DeepSeek-V4, featuring a 1 million token context window and enhanced agent capabilities. The model comes in two versions, Pro and Flash, with the Pro version s…
-
LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning
Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…
-
Medical thinking with multiple images
Researchers have developed MIRAGE, a system designed to aid medical education by retrieving and generating multimodal medical images and texts. MIRAGE utilizes a fine-tuned CLIP model (MedICaT-ROCO) and a diffusion mode…
-
FINAL-Bench/Darwin-36B-Opus · Hugging Face
The Darwin-36B-Opus model, a 36-billion-parameter mixture-of-experts language model, has been released. It was created using the Darwin V7 evolutionary breeding engine, combining aspects of Qwen/Qwen3.6-35B-A3B and a Cl…