PulseAugur
EN
LIVE 15:46:09
ENTITY CORE-Bench

CORE-Bench

PulseAugur coverage of CORE-Bench — every cluster mentioning CORE-Bench across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
5
5 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
4
4 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 5 TOTAL
  1. TOOL · CL_111644 ·

    New benchmark approach evaluates AI agents beyond accuracy

    A new research paper proposes moving beyond accuracy-centric evaluation for AI agents, even when benchmarks saturate. The study uses CORE-Bench Hard, a computational reproducibility benchmark, to demonstrate the value o…

  2. TOOL · CL_108023 ·

    New benchmark CORE-Bench tests AI agents' scientific reproducibility

    Researchers have introduced CORE-Bench, a new benchmark designed to evaluate the ability of AI agents to perform computational reproducibility tasks. This benchmark comprises 270 tasks derived from 90 scientific papers …

  3. RESEARCH · CL_80489 ·

    Anthropic AI engineers ship code 8x faster with recursive self-improvement

    Anthropic has released data indicating significant advancements in AI development, with their engineers now shipping code eight times faster than in a previous baseline period. The company's AI models, like Claude, are …

  4. RESEARCH · CL_71530 ·

    Anthropic details AI's growing role in its own development

    Anthropic has published research indicating that AI systems are increasingly contributing to their own development, a trend they term "recursive self-improvement." This process, where AI assists in designing and develop…

  5. RESEARCH · CL_01095 ·

    AI agents struggle to reproduce research, new benchmarks reveal

    Researchers have developed AutoReproduce, a multi-agent framework designed to automatically reproduce AI experiments from research papers. This system utilizes a "paper lineage" to mine implicit knowledge from cited lit…