ENTITY ProgramBench

ProgramBench

PulseAugur coverage of ProgramBench — every cluster mentioning ProgramBench across labs, papers, and developer communities, ranked by signal.

Total · 30d

1

4 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

1

3 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 4 TOTAL

TOOL · CL_171855 · Jul 30 · 04:00

MindForge pipeline trains small LLMs for full-cycle software engineering

Researchers have developed MindForge, a novel pipeline designed to train smaller language models in comprehensive software engineering tasks. This system converts open-source command-line programs into source-free train…
TOOL · CL_94988 · Jun 16 · 13:54

Fable 5 benchmark shows double Opus 4.8 performance

A new benchmark, ProgramBench, has been used to evaluate Fable 5, with results suggesting it significantly outperforms Opus 4.8. The benchmark creator noted that Fable 5's performance was double that of Opus 4.8, even w…
RESEARCH · CL_23515 · May 8 · 17:04

ProgramBench coding benchmark fails frontier models due to impossible undocumented tests

A new coding benchmark called ProgramBench, designed to evaluate frontier AI models, has been criticized for being potentially impossible to solve. The benchmark requires models to reimplement programs based on limited …
RESEARCH · CL_18314 · May 5 · 09:17

ProgramBench benchmark finds language models struggle to build software from scratch

Researchers have introduced ProgramBench, a new benchmark designed to evaluate the holistic software development capabilities of language models. The benchmark challenges AI agents to architect and implement entire code…