Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents
Researchers have developed PROBE, a new framework designed to improve the recovery process for software engineering agents after failures. PROBE structures telemetry data from failed runs into evidence, diagnoses, and actionable guidance for subsequent attempts. In evaluations, PROBE demonstrated a 65.37% diagnosis accuracy and a 21.79% recovery rate on unresolved cases, significantly outperforming existing methods. A prototype integration with Microsoft's IcM system showed PROBE can enhance existing workflows without altering agent policies or tools. AI
IMPACT Enhances reliability of AI agents in complex software engineering tasks, potentially reducing manual intervention.