Pull Requests as a Training Signal for Repo-Level Code Editing
Researchers have developed a new method called Clean Pull Request (Clean-PR) to train AI models for repository-level code editing. This approach utilizes real-world GitHub pull requests, converting them into a structured dataset of over 2 million edits across 12 programming languages. By training models with this data, the researchers achieved significant performance improvements on the SWE-bench benchmark without relying on complex agent scaffolding during inference. AI
IMPACT Enhances AI's ability to perform complex, multi-file code modifications, potentially streamlining software development workflows.