Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 4d · [6 sources]

TIP: Token Importance in On-Policy Distillation

Researchers have developed new methods to improve on-policy distillation (OPD), a technique for training smaller language models using larger ones. One approach, TIP, identifies informative tokens by analyzing student entropy and teacher-student divergence, achieving significant memory reduction and performance gains. Another method, SimCT, addresses issues with different tokenizers by expanding the supervision space to include multi-token continuations, recovering lost signal and improving performance on reasoning and code generation tasks. Additionally, EffOPD accelerates OPD training by optimizing update trajectories and module allocation, leading to a threefold speedup. AI

IMPACT These research advancements offer more efficient and effective ways to train smaller language models, potentially reducing computational costs and improving performance on complex reasoning tasks.

Qwen3
Llama
AIME 2025
Qwen2.5
On-Policy Distillation
MATH-500
DeepSeek-R1-Distill-Llama-8B
SimCT
DeepPlanning
Olmo-3-7B-Think
AIME 2024
EffOPD