PulseAugur
EN
LIVE 08:29:25
ENTITY Terminal-Bench

Terminal-Bench

PulseAugur coverage of Terminal-Bench — every cluster mentioning Terminal-Bench across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
9
9 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
4
4 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 9 TOTAL
  1. TOOL · CL_108106 ·

    Sakana Fugu orchestrator models combine LLMs for collective intelligence

    Researchers have developed Sakana Fugu, a family of orchestrator models designed to combine the specialized capabilities of multiple Large Language Models (LLMs) into a collectively intelligent system. These models act …

  2. FRONTIER RELEASE · CL_95424 ·

    Fireworks AI launches GLM-5.2 with 1M context, optimized for coding

    Fireworks AI has launched GLM-5.2, a new frontier model with a 1 million token context window, optimized for coding tasks. The model has undergone independent validation on benchmarks including SWE-bench and GPQA. Firew…

  3. FRONTIER RELEASE · CL_92810 ·

    Z.ai releases GLM-5.2, setting new open-source benchmark for long-context AI

    Z.ai has released GLM-5.2, an open-source language model with a 1 million token context window, positioning it as a strong contender in long-horizon tasks and coding benchmarks. The model features an improved architectu…

  4. RESEARCH · CL_79460 ·

    AI benchmarks hardened against reward hacking with adversarial loops

    Researchers have developed a novel "hacker-fixer loop" to improve the robustness of AI agent benchmarks against reward hacking. This adversarial process uses three LLM agents to iteratively identify and patch vulnerabil…

  5. RESEARCH · CL_72413 ·

    New methods enhance AI agent reliability and safety

    Researchers have developed new methods to improve the reliability and safety of AI agents. One approach, TRACE, focuses on monitoring long-horizon agent trajectories to detect malicious or unintended behaviors by analyz…

  6. SIGNIFICANT · CL_48042 ·

    Fireworks AI enables training of trillion-parameter MoE models

    Fireworks AI has developed a new training infrastructure that enables the fine-tuning of trillion-parameter Mixture-of-Experts (MoE) models, overcoming previous memory and orchestration bottlenecks. This platform was in…

  7. COMMENTARY · CL_20705 ·

    AI models: Choose benchmarks over hype for true performance

    A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …

  8. TOOL · CL_13981 ·

    DeepClaude slashes coding agent costs by 17x using DeepSeek V4 Pro

    An open-source tool called DeepClaude has gained significant traction by allowing developers to use the Claude Code agent loop with DeepSeek V4 Pro instead of Anthropic's models. This swap drastically reduces costs, wit…

  9. RESEARCH · CL_17452 ·

    Public AI models replicate Anthropic's vulnerability discovery findings

    Researchers have successfully replicated Anthropic's Mythos findings using publicly available AI models like GPT-5.4 and Claude Opus 4.6. This suggests that advanced AI capabilities for discovering software vulnerabilit…