Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [3 sources]

Trajectory-Refined Distillation

Researchers have introduced Trajectory-Refined Distillation (TRD), a new method to improve the post-training process for large language models. TRD addresses a problem called "prefix failure" in on-policy distillation, where dense per-token supervision leads to fragmented gradients. By correcting student model rollouts at the trajectory level before distillation, TRD mitigates this issue and enhances exploration. The method has demonstrated consistent performance improvements across various benchmarks and model scales. AI

IMPACT Enhances LLM reasoning and accuracy by refining distillation techniques.

Large language models
On-policy distillation
Prefix failure
Trajectory-Refined Distillation
On-policy self-distillation