ENTITY Terminal-Bench

Terminal-Bench

PulseAugur coverage of Terminal-Bench — every cluster mentioning Terminal-Bench across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

9 over 90d

Releases · 30d

0 over 90d

Papers · 30d

4 over 90d

TIER MIX · 90D

frontier release 1
significant 2
research 3
tool 2
commentary 1

TOPICS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 9 TOTAL

TOOL · CL_108106 · Jun 24 · 04:00

Sakana Fugu orchestrator models combine LLMs for collective intelligence

Researchers have developed Sakana Fugu, a family of orchestrator models designed to combine the specialized capabilities of multiple Large Language Models (LLMs) into a collectively intelligent system. These models act …
FRONTIER RELEASE · CL_95424 · Jun 16 · 22:11

Fireworks AI launches GLM-5.2 with 1M context, optimized for coding

Fireworks AI has launched GLM-5.2, a new frontier model with a 1 million token context window, optimized for coding tasks. The model has undergone independent validation on benchmarks including SWE-bench and GPQA. Firew…
FRONTIER RELEASE · CL_92810 · Jun 15 · 23:59

Z.ai releases GLM-5.2, setting new open-source benchmark for long-context AI

Z.ai has released GLM-5.2, an open-source language model with a 1 million token context window, positioning it as a strong contender in long-horizon tasks and coding benchmarks. The model features an improved architectu…
RESEARCH · CL_79460 · Jun 8 · 03:00

AI benchmarks hardened against reward hacking with adversarial loops

Researchers have developed a novel "hacker-fixer loop" to improve the robustness of AI agent benchmarks against reward hacking. This adversarial process uses three LLM agents to iteratively identify and patch vulnerabil…
RESEARCH · CL_72413 · Jun 4 · 09:26

New methods enhance AI agent reliability and safety

Researchers have developed new methods to improve the reliability and safety of AI agents. One approach, TRACE, focuses on monitoring long-horizon agent trajectories to detect malicious or unintended behaviors by analyz…
SIGNIFICANT · CL_48042 · May 18 · 19:53

Fireworks AI enables training of trillion-parameter MoE models

Fireworks AI has developed a new training infrastructure that enables the fine-tuning of trillion-parameter Mixture-of-Experts (MoE) models, overcoming previous memory and orchestration bottlenecks. This platform was in…
COMMENTARY · CL_20705 · May 7 · 04:27

AI models: Choose benchmarks over hype for true performance

A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …
TOOL · CL_13981 · May 3 · 22:13

DeepClaude slashes coding agent costs by 17x using DeepSeek V4 Pro

An open-source tool called DeepClaude has gained significant traction by allowing developers to use the Claude Code agent loop with DeepSeek V4 Pro instead of Anthropic's models. This swap drastically reduces costs, wit…
RESEARCH · CL_17452 · Apr 17 · 14:09

Public AI models replicate Anthropic's vulnerability discovery findings

Researchers have successfully replicated Anthropic's Mythos findings using publicly available AI models like GPT-5.4 and Claude Opus 4.6. This suggests that advanced AI capabilities for discovering software vulnerabilit…

Sakana Fugu orchestrator models combine LLMs for collective intelligence

Fireworks AI launches GLM-5.2 with 1M context, optimized for coding

Z.ai releases GLM-5.2, setting new open-source benchmark for long-context AI

AI benchmarks hardened against reward hacking with adversarial loops

New methods enhance AI agent reliability and safety

Fireworks AI enables training of trillion-parameter MoE models

AI models: Choose benchmarks over hype for true performance

DeepClaude slashes coding agent costs by 17x using DeepSeek V4 Pro

Public AI models replicate Anthropic's vulnerability discovery findings