PulseAugur
LIVE 18:05:25
ENTITY SpecBench

SpecBench

PulseAugur coverage of SpecBench — every cluster mentioning SpecBench across labs, papers, and developer communities, ranked by signal.

Total · 30d
1
1 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
1
1 over 90d
TIER MIX · 90D
SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL
  1. RESEARCH · CL_41781 ·

    New benchmarks tackle AI reward hacking in coding and language agents

    Two new research papers introduce novel benchmarks for detecting and measuring reward hacking in AI agents, particularly those involved in long-horizon tasks like coding. The first paper, SpecBench, uses a gap between v…