Researchers have developed ReproRepo, a new framework designed to make reproducibility audits of machine learning papers more scalable. This system utilizes GitHub issues as a source of real-world reproduction blockers, reducing the need for manual data curation. When tested with leading LLM agents, including Codex powered by GPT-5.5, the framework demonstrated significant success, identifying at least one relevant issue for approximately 90% of the evaluated papers, even without executing code. AI
IMPACT Enhances scientific rigor by improving the scalability of LLM-assisted reproducibility audits.
RANK_REASON The cluster describes a new framework and evaluation of LLM agents for scientific reproducibility, published on arXiv. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →