PulseAugur
EN
LIVE 07:27:55

New framework ReproRepo scales ML reproducibility audits using GitHub issues

Researchers have developed ReproRepo, a new framework designed to make reproducibility audits of machine learning papers more scalable. This system utilizes GitHub issues as a source of real-world reproduction blockers, reducing the need for manual data curation. When tested with leading LLM agents, including Codex powered by GPT-5.5, the framework demonstrated significant success, identifying at least one relevant issue for approximately 90% of the evaluated papers, even without executing code. AI

IMPACT Enhances scientific rigor by improving the scalability of LLM-assisted reproducibility audits.

RANK_REASON The cluster describes a new framework and evaluation of LLM agents for scientific reproducibility, published on arXiv. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Ameet Talwalkar ·

    ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

    Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effo…