PulseAugur
EN
LIVE 07:04:41
ENTITY SWE-bench Verified

SWE-bench Verified

PulseAugur coverage of SWE-bench Verified — every cluster mentioning SWE-bench Verified across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
36
36 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
20
20 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

13 day(s) with sentiment data

RECENT · PAGE 1/2 · 36 TOTAL
  1. SIGNIFICANT · CL_110786 ·

    DeepReinforce releases Ornith-1.0 open-source coding models that learn RL scaffolds

    DeepReinforce has launched Ornith-1.0, a family of open-source coding models available under the MIT license. These models, built upon Gemma 4 and Qwen 3.5, are designed for agentic coding tasks and uniquely learn their…

  2. SIGNIFICANT · CL_110172 ·

    Alibaba's Qwen3-Coder-Next achieves 70.6% on SWE-bench with efficient MoE architecture

    The Qwen3-Coder-Next model, an 80 billion parameter Mixture-of-Experts model from Alibaba's Qwen team, has demonstrated impressive efficiency by achieving 70.6% on the SWE-bench Verified benchmark with only approximatel…

  3. RESEARCH · CL_107786 ·

    New SHERLOC framework boosts LLM code repair efficiency and accuracy

    Researchers have developed SHERLOC, a novel framework designed to improve the efficiency and accuracy of Large Language Model (LLM) agents in code repair tasks. This training-free framework utilizes a reasoning LLM with…

  4. COMMENTARY · CL_102754 ·

    AI models show significant performance drop on private codebases, cost concerns rise

    New benchmarks reveal a significant gap between AI model performance on standardized tests and their effectiveness on private, real-world codebases. While models like Claude Opus 4.8 excel on public benchmarks like SWE-…

  5. TOOL · CL_98376 ·

    Users optimize Qwen3.6-27B for consumer GPUs with long context

    Users are sharing optimized settings for running the Qwen3.6-27B large language model on consumer hardware, particularly focusing on maximizing performance with limited VRAM. Discussions cover various quantization metho…

  6. RESEARCH · CL_97275 ·

    Chinese AI labs release powerful open models, challenging US frontier AI

    Chinese AI labs are rapidly advancing their open-weight models, with Z.ai's GLM-5.2 achieving impressive benchmark scores and a one million token context window, rivaling top closed models like Opus 4.8 and GPT-5.5 at a…

  7. RESEARCH · CL_96671 ·

    New tuning method boosts LLM coding agent performance

    Researchers have developed a new method called probe-and-refine tuning to improve the performance of large language model (LLM) coding agents. This technique focuses on enhancing the guidance files that direct agents to…

  8. TOOL · CL_93606 ·

    HyDRA framework dynamically routes LLM queries, cutting costs and improving efficiency

    Researchers have developed HyDRA, a novel framework for dynamically routing queries to heterogeneous pools of large language models. Unlike previous methods that make binary strong-vs-weak decisions or require retrainin…

  9. TOOL · CL_93154 ·

    New study reveals widespread reward hackability in code RL training environments

    A new paper from arXiv details how easily current code reinforcement learning (RL) training environments can be exploited. Researchers found that a significant percentage of tasks in SWE-bench Verified and R2E-Gym accep…

  10. TOOL · CL_106548 ·

    GeneralVLA-2 enhances robot planning with improved 3D reconstruction and memory

    Researchers have introduced GeneralVLA-2, an advancement in vision-language-action systems designed for robotic planning. The system incorporates GeoFuse-MV3D to enhance 3D reconstruction accuracy by leveraging geometry…

  11. RESEARCH · CL_96078 ·

    GeneralVLA-2 advances robot planning with improved 3D reconstruction and memory

    Researchers have introduced GeneralVLA-2, an advancement in vision-language-action systems designed for robot planning. This system incorporates GeoFuse-MV3D for enhanced 3D reconstruction and an improved KnowledgeBank …

  12. RESEARCH · CL_93485 ·

    New LLM techniques enhance reasoning via iterative refinement and optimized looping · 5 sources tracked

    Researchers have developed new methods to improve the reasoning capabilities of large language models (LLMs) through test-time scaling. The REVES framework uses a two-stage iterative process to augment training data and…

  13. SIGNIFICANT · CL_99036 ·

    Poolside releases Laguna M.1, a 225B MoE model for agentic coding

    Poolside has released Laguna M.1, a 225 billion parameter Mixture-of-Experts model optimized for agentic coding tasks. The model features a large sparse MoE architecture with 256 experts and global attention, enabling i…

  14. TOOL · CL_86287 ·

    Claude Fable 5's benchmark scores questioned amid cheating allegations

    Anthropic's Claude Fable 5 achieved a 95% score on its self-reported SWE-bench Verified benchmark, but an independent evaluation by Endor Labs revealed a significantly lower 19% score on real-world security vulnerabilit…

  15. COMMENTARY · CL_84695 ·

    Claude Code outperforms OpenAI Codex for production coding tasks

    A team of 12 engineers has found Anthropic's Claude Code to be a superior AI coding assistant compared to OpenAI's Codex for production development. Over three months and 50+ projects, they determined Claude Code is bet…

  16. RESEARCH · CL_79494 ·

    MetaAI Recursive Self-Design Framework Introduced with DGM Benchmark Results

    A new research paper introduces the concept of "MetaAI Recursive Self-Design," defining it as an AI-assisted development pattern where the AI itself modifies its building and improvement mechanisms. The paper proposes a…

  17. TOOL · CL_74420 ·

    New method FuseSearch boosts code localization efficiency

    Researchers have developed FuseSearch, a new method to improve code localization in automated software development. This approach reformulates the task as a joint quality-efficiency optimization, aiming to reduce redund…

  18. RESEARCH · CL_72413 ·

    New methods enhance AI agent reliability and safety

    Researchers have developed new methods to improve the reliability and safety of AI agents. One approach, TRACE, focuses on monitoring long-horizon agent trajectories to detect malicious or unintended behaviors by analyz…

  19. TOOL · CL_70242 ·

    AI agent intervention timing proves unreliable, study finds

    A new research paper explores the challenges of determining when to intervene in autonomous AI agents, particularly during long-horizon tasks. The study found that agents can enter a "saturation trap" where they show no…

  20. TOOL · CL_62924 ·

    CoMem framework decouples context management for faster AI agents

    Researchers have developed CoMem, a new framework that separates context management from an agent's primary workflow, allowing these processes to run concurrently. This asynchronous approach uses a k-step-off pipeline t…