ENTITY SWE-bench Pro

SWE-bench Pro

PulseAugur coverage of SWE-bench Pro — every cluster mentioning SWE-bench Pro across labs, papers, and developer communities, ranked by signal.

Total · 30d

6 over 90d

Releases · 30d

0 over 90d

Papers · 30d

3 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 6 TOTAL

SIGNIFICANT · CL_19920 · May 6 · 19:39

Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals

Z.AI has released its GLM 5.1 model, an open-source option designed for long-horizon agentic tasks capable of running autonomously for up to 8 hours. This model reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemin…
RESEARCH · CL_07734 · Apr 28 · 16:17

Poolside AI releases open-weight Laguna XS.2 and M.1 coding models

Poolside AI has released two new agentic coding models, Laguna M.1 and Laguna XS.2, along with their agent training and operation runtime. Laguna M.1 is a large Mixture of Experts (MoE) model trained on 30T tokens using…
RESEARCH · CL_03449 · Apr 8 · 00:00

Anthropic's Claude Mythos finds zero-days; GLM-5.1 targets long tasks

Anthropic's Claude Mythos Preview has demonstrated a significant capability in identifying zero-day vulnerabilities in critical software, leading to the formation of Project Glasswing to enhance cybersecurity. Meanwhile…
RESEARCH · CL_00777 · Feb 23 · 20:03

OpenAI abandons SWE-bench Verified due to flawed tests and data contamination

OpenAI has announced it will no longer use SWE-bench Verified to evaluate the coding capabilities of frontier AI models. The benchmark has become contaminated, with models showing improved scores primarily due to exposu…
FRONTIER RELEASE · CL_01748 · May 22 · 05:44

Anthropic's Claude Opus 4.7

Anthropic has launched Claude Design, a new product integrated with its Claude Opus 4.7 model, enabling users to collaborate on visual content creation. This tool allows for the generation and refinement of designs, pro…
FRONTIER RELEASE · CL_00980 · Nov 5 · 08:00

OpenAI launches GPT-5.5, boosting AI intelligence and speed for complex tasks

OpenAI has released GPT-5.5 and GPT-5.5 Pro, their latest and most intuitive models, designed for complex tasks and agentic capabilities. These models excel in areas like coding, data analysis, and operating software, o…

Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals

Poolside AI releases open-weight Laguna XS.2 and M.1 coding models

Anthropic's Claude Mythos finds zero-days; GLM-5.1 targets long tasks

OpenAI abandons SWE-bench Verified due to flawed tests and data contamination

Anthropic's Claude Opus 4.7

OpenAI launches GPT-5.5, boosting AI intelligence and speed for complex tasks