PulseAugur
EN
LIVE 07:06:27

Phoenix LLM system automates GitHub issue resolution with safety controls

A new multi-agent LLM system named Phoenix has been developed to automate the resolution of GitHub issues, from initial triage to the creation of pull requests. This system incorporates seven layers of safety controls and a baseline-aware testing strategy to ensure reliability. Phoenix decomposes tasks among six specialized agents, including a planner, reproducer, coder, tester, failure analyst, and a PR agent, all coordinated by a webhook state machine. The system demonstrated a 75% oracle-resolution rate on a curated SWE-bench Lite dataset and maintained 100% correctness preservation on real-world issues, though some pull requests required planner localization improvements. AI

IMPACT This system could significantly streamline software development workflows by automating issue resolution and improving code quality.

RANK_REASON The item describes a research paper detailing a new multi-agent LLM system for a specific software engineering task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.MA (Multiagent) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Phoenix LLM system automates GitHub issue resolution with safety controls

COVERAGE [1]

  1. arXiv cs.MA (Multiagent) TIER_1 (CA) · Joao Barros ·

    Phoenix: Safe GitHub Issue Resolution via Multi-Agent LLMs

    We present Phoenix, a multi-agent LLM system that resolves GitHub issues from triage through pull-request creation, combining seven layered safety controls with a baseline-aware test evaluation strategy. Phoenix decomposes the work across six specialized agents. Planner, reproduc…