PulseAugur / Brief
EN
LIVE 13:23:11

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

    A new research paper explores the mechanics of on-policy distillation (OPD), a post-training technique that combines on-policy student trajectories with dense teacher supervision. The study reveals that OPD updates are small and coordinate-sparse, primarily affecting Feed-Forward Network (FFN) modules. This sparsity is functional, as training only the identified subnetwork nearly matches full-training performance. Furthermore, the research indicates that while updates are numerically full-rank, they are spectrally concentrated and do not align with the principal singular subspaces of the original weights, suggesting OPD retains unique geometric properties of on-policy post-training rather than acting as standard dense parameter rewriting. AI

    IMPACT Reveals that on-policy distillation creates sparse, geometrically distinct parameter updates, suggesting a unique editing mechanism for large models.