TIP: Token Importance in On-Policy Distillation
Researchers have developed new methods to improve on-policy distillation (OPD), a technique for training smaller language models using larger ones. One approach, TIP, identifies informative tokens by analyzing student entropy and teacher-student divergence, achieving significant memory reduction and performance gains. Another method, SimCT, addresses issues with different tokenizers by expanding the supervision space to include multi-token continuations, recovering lost signal and improving performance on reasoning and code generation tasks. Additionally, EffOPD accelerates OPD training by optimizing update trajectories and module allocation, leading to a threefold speedup. AI
IMPACT These research advancements offer more efficient and effective ways to train smaller language models, potentially reducing computational costs and improving performance on complex reasoning tasks.