Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
Researchers have introduced DRIFT-Bench, a new benchmark designed to analyze failure modes in multi-turn reasoning systems. Their findings indicate that these systems predominantly fail through 'satisfiable drift,' where the system's internal state remains consistent but its output violates prior commitments, rather than outright logical contradiction. The study also highlights MUS-Repair, a method that uses minimal unsatisfiable subsets for feedback, as a strong performer, significantly reducing contradiction errors and increasing the satisfiability of residual errors. AI
IMPACT Identifies a critical failure mode in multi-turn AI reasoning, suggesting new validation strategies are needed for reliable system performance.