Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 16h

What Makes Interaction Trajectories Effective for Training Terminal Agents?

A new research paper explores the effectiveness of interaction trajectories for training AI agents, finding that standalone performance doesn't dictate teaching efficacy. Surprisingly, agents fine-tuned on trajectories from a lower-scoring model, DeepSeek-V3.2, showed better generalization than those trained on a higher-scoring model, Claude Opus 4.6. This "pedagogical paradox" is attributed to Environment-Grounded Supervision (EGS), which exposes inspect-act-verify behaviors, enabling students to internalize problem-solving routines. The study also highlights exceptional data efficiency, with Qwen3-32B achieving state-of-the-art performance using significantly less data. AI

IMPACT Suggests a shift in AI agent training from outcome-matching to harness engineering for better generalization.

Claude Opus 4.6
DeepSeek-V3.2
Terminal-Bench 2.0
Qwen3-32B
Environment-Grounded Supervision (EGS)
Terminal-Lego