PulseAugur
EN
LIVE 08:45:27
ENTITY Terminal Bench 2.0

Terminal Bench 2.0

PulseAugur coverage of Terminal Bench 2.0 — every cluster mentioning Terminal Bench 2.0 across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
19
19 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
10
10 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

7 day(s) with sentiment data

RECENT · PAGE 1/1 · 19 TOTAL
  1. FRONTIER RELEASE · CL_108496 ·

    Alibaba Qwen unveils AgentWorld language model for environment simulation

    Alibaba's Qwen team has introduced Qwen-AgentWorld, a new language world model designed to simulate various agent environments. This model focuses on training LLMs to understand and predict environments, rather than jus…

  2. TOOL · CL_107959 ·

    New LemonHarness framework boosts LLM agent performance on long tasks

    Researchers have developed LemonHarness, a new execution framework designed to improve the stability and performance of large language model (LLM) agents working on extended tasks. The framework establishes explicit exe…

  3. TOOL · CL_107146 ·

    Tmax-27B terminal agent released, optimized for consumer GPUs

    A new terminal agent model named Tmax-27B has been released, built upon Qwen3.6-27B and trained using DPPO for reinforcement learning. This model achieves competitive scores on agentic benchmarks like Terminal Bench 2.0…

  4. TOOL · CL_105288 ·

    Xiaomi launches MiMo Code with persistent memory, claims Claude Code advantage

    Xiaomi has released MiMo Code, an open-source fork of the OpenCode terminal coding agent. This new version introduces a persistent memory system designed to handle long tasks, along with subagent orchestration and intel…

  5. TOOL · CL_93131 ·

    New APEX Framework Enhances AI Agent Self-Improvement

    Researchers have introduced APEX, a novel three-layer framework designed to enhance AI agent self-improvement. Unlike previous methods that focused solely on prompt optimization, APEX simultaneously evolves the agent's …

  6. TOOL · CL_106548 ·

    GeneralVLA-2 enhances robot planning with improved 3D reconstruction and memory

    Researchers have introduced GeneralVLA-2, an advancement in vision-language-action systems designed for robotic planning. The system incorporates GeoFuse-MV3D to enhance 3D reconstruction accuracy by leveraging geometry…

  7. RESEARCH · CL_96078 ·

    GeneralVLA-2 advances robot planning with improved 3D reconstruction and memory

    Researchers have introduced GeneralVLA-2, an advancement in vision-language-action systems designed for robot planning. This system incorporates GeoFuse-MV3D for enhanced 3D reconstruction and an improved KnowledgeBank …

  8. SIGNIFICANT · CL_99036 ·

    Poolside releases Laguna M.1, a 225B MoE model for agentic coding

    Poolside has released Laguna M.1, a 225 billion parameter Mixture-of-Experts model optimized for agentic coding tasks. The model features a large sparse MoE architecture with 256 experts and global attention, enabling i…

  9. TOOL · CL_79558 ·

    Self-Harness enables LLM agents to improve their own operational harnesses

    Researchers have developed a novel method called Self-Harness, enabling LLM-based agents to autonomously improve their own operational harnesses. This iterative process involves identifying model-specific failure patter…

  10. TOOL · CL_68283 ·

    Research: Interaction trajectories boost AI agent generalization

    A new research paper explores the effectiveness of interaction trajectories for training AI agents, finding that standalone performance doesn't dictate teaching efficacy. Surprisingly, agents fine-tuned on trajectories …

  11. TOOL · CL_60204 ·

    AI coding agents: GPT-5.5, Claude Sonnet 4.6, Gemini 3.5 Flash compared

    A recent comparison evaluated three AI coding agents: OpenAI's Codex (powered by GPT-5.5), Anthropic's Claude Code (using Claude Sonnet 4.6), and Google's Antigravity (with Gemini 3.5 Flash). The experiment focused on r…

  12. SIGNIFICANT · CL_56706 ·

    Alibaba's Qwen3.7-Max debuts with 1M context, autonomous coding

    Alibaba has released Qwen3.7-Max, an agent-first LLM with a 1 million token context window, capable of autonomous coding tasks. The model demonstrated a 35-hour coding session without human intervention, optimizing code…

  13. TOOL · CL_35928 ·

    Local LLMs struggle with real-world terminal tasks despite benchmark success

    Local large language models often perform poorly on multi-step terminal tasks despite excelling at standard benchmarks like MMLU. This discrepancy arises because traditional benchmarks measure single-turn reasoning, fai…

  14. TOOL · CL_34986 ·

    Llama.cpp adds MTP, new Gemma-4 finetune released, Qwen 3.6 excels locally

    The llama.cpp project has integrated Multi-head Attention Parallelism (MTP), leading to an 11.5% speed increase for 27B Qwen models in local inference. A new finetuned Gemma-4 model, optimized for creative writing and a…

  15. SIGNIFICANT · CL_26039 ·

    Qwen 3.6-Plus excels in complex AI agent tasks and coding

    Alibaba's Qwen 3.6-Plus model has demonstrated advanced capabilities in complex decision-making and agentic coding tasks, according to a recent evaluation. The model successfully generated a detailed implementation plan…

  16. RESEARCH · CL_07734 ·

    Poolside AI releases open-weight Laguna XS.2 and M.1 coding models

    Poolside AI has released two new agentic coding models, Laguna M.1 and Laguna XS.2, along with their agent training and operation runtime. Laguna M.1 is a large Mixture of Experts (MoE) model trained on 30T tokens using…

  17. RESEARCH · CL_47566 ·

    Anthropic's 'Mythos' AI too risky for public release

    Anthropic has developed a new AI model named Claude Mythos, which demonstrates significant advancements in benchmark performance, particularly in identifying software vulnerabilities. Due to its advanced capabilities in…

  18. FRONTIER RELEASE · CL_01718 ·

    Google DeepMind launches Gemini 3 Pro with advanced coding and agentic capabilities

    Google DeepMind has launched Gemini 3 Pro, their latest and most intelligent model, which demonstrates significant improvements in reasoning and coding capabilities. This new model surpasses previous versions and excels…

  19. RESEARCH · CL_99526 ·

    New research explores LLM agent evaluation and improvement techniques

    Researchers are exploring new methods for evaluating and improving Large Language Model (LLM) agents. One paper introduces semantic early-stopping for iterative LLM loops, aiming to reduce token usage by halting when me…