A new research paper explores the mechanics of on-policy distillation (OPD), a post-training technique that combines on-policy student trajectories with dense teacher supervision. The study reveals that OPD updates are small and coordinate-sparse, primarily affecting Feed-Forward Network (FFN) modules. This sparsity is functional, as training only the identified subnetwork nearly matches full-training performance. Furthermore, the research indicates that while updates are numerically full-rank, they are spectrally concentrated and do not align with the principal singular subspaces of the original weights, suggesting OPD retains unique geometric properties of on-policy post-training rather than acting as standard dense parameter rewriting. AI
IMPACT Reveals that on-policy distillation creates sparse, geometrically distinct parameter updates, suggesting a unique editing mechanism for large models.
RANK_REASON The cluster contains an academic paper detailing novel research findings on a machine learning technique.
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →