AI models trained on GitHub pull requests show improved code editing

By PulseAugur Editorial · [1 sources] · 2026-06-01 04:00

Researchers have developed a new method called Clean Pull Request (Clean-PR) to train AI models for repository-level code editing. This approach utilizes real-world GitHub pull requests, converting them into a structured dataset of over 2 million edits across 12 programming languages. By training models with this data, the researchers achieved significant performance improvements on the SWE-bench benchmark without relying on complex agent scaffolding during inference. AI

IMPACT Enhances AI's ability to perform complex, multi-file code modifications, potentially streamlining software development workflows.

RANK_REASON Academic paper detailing a new training methodology for AI code editing. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Qinglin Zhu, Tianyu Chen, Shuai Lu, Lei Ji, Runcong Zhao, Murong Ma, Xiangxiang Dai, Yulan He, Lin Gui, Peng cheng, Yeyun Gong · 2026-06-01 04:00

Pull Requests as a Training Signal for Repo-Level Code Editing

arXiv:2602.07457v2 Announce Type: replace-cross Abstract: Repository-level code editing requires models to understand complex dependencies and execute precise multi-file modifications across a large codebase. While recent gains on SWE-bench rely heavily on complex agent scaffoldi…

COVERAGE [1]

Pull Requests as a Training Signal for Repo-Level Code Editing

RELATED ENTITIES

RELATED TOPICS