A new multi-agent LLM system named Phoenix has been developed to automate the resolution of GitHub issues, from initial triage to the creation of pull requests. This system incorporates seven layers of safety controls and a baseline-aware testing strategy to ensure reliability. Phoenix decomposes tasks among six specialized agents, including a planner, reproducer, coder, tester, failure analyst, and a PR agent, all coordinated by a webhook state machine. The system demonstrated a 75% oracle-resolution rate on a curated SWE-bench Lite dataset and maintained 100% correctness preservation on real-world issues, though some pull requests required planner localization improvements. AI
IMPACT This system could significantly streamline software development workflows by automating issue resolution and improving code quality.
RANK_REASON The item describes a research paper detailing a new multi-agent LLM system for a specific software engineering task. [lever_c_demoted from research: ic=1 ai=1.0]
Read on arXiv cs.MA (Multiagent) →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →