ENTITY CORE-Bench

CORE-Bench

PulseAugur coverage of CORE-Bench — every cluster mentioning CORE-Bench across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

5 over 90d

Releases · 30d

0 over 90d

Papers · 30d

4 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 5 TOTAL

TOOL · CL_111644 · Jun 26 · 04:00

New benchmark approach evaluates AI agents beyond accuracy

A new research paper proposes moving beyond accuracy-centric evaluation for AI agents, even when benchmarks saturate. The study uses CORE-Bench Hard, a computational reproducibility benchmark, to demonstrate the value o…
TOOL · CL_108023 · Jun 24 · 04:00

New benchmark CORE-Bench tests AI agents' scientific reproducibility

Researchers have introduced CORE-Bench, a new benchmark designed to evaluate the ability of AI agents to perform computational reproducibility tasks. This benchmark comprises 270 tasks derived from 90 scientific papers …
RESEARCH · CL_80489 · Jun 9 · 08:16

Anthropic AI engineers ship code 8x faster with recursive self-improvement

Anthropic has released data indicating significant advancements in AI development, with their engineers now shipping code eight times faster than in a previous baseline period. The company's AI models, like Claude, are …
RESEARCH · CL_71530 · Jun 4 · 16:20

Anthropic details AI's growing role in its own development

Anthropic has published research indicating that AI systems are increasingly contributing to their own development, a trend they term "recursive self-improvement." This process, where AI assists in designing and develop…
RESEARCH · CL_01095 · Sep 18 · 14:32

AI agents struggle to reproduce research, new benchmarks reveal

Researchers have developed AutoReproduce, a multi-agent framework designed to automatically reproduce AI experiments from research papers. This system utilizes a "paper lineage" to mine implicit knowledge from cited lit…

New benchmark approach evaluates AI agents beyond accuracy

New benchmark CORE-Bench tests AI agents' scientific reproducibility

Anthropic AI engineers ship code 8x faster with recursive self-improvement

Anthropic details AI's growing role in its own development

AI agents struggle to reproduce research, new benchmarks reveal