What Makes Interaction Trajectories Effective for Training Terminal Agents?
A new research paper explores the effectiveness of interaction trajectories for training AI agents, finding that standalone performance doesn't dictate teaching efficacy. Surprisingly, agents fine-tuned on trajectories from a lower-scoring model, DeepSeek-V3.2, showed better generalization than those trained on a higher-scoring model, Claude Opus 4.6. This "pedagogical paradox" is attributed to Environment-Grounded Supervision (EGS), which exposes inspect-act-verify behaviors, enabling students to internalize problem-solving routines. The study also highlights exceptional data efficiency, with Qwen3-32B achieving state-of-the-art performance using significantly less data. AI
IMPACT Suggests a shift in AI agent training from outcome-matching to harness engineering for better generalization.