Researchers have developed new methods to improve on-policy distillation (OPD), a technique for training smaller language models using larger ones. One approach, TIP, identifies informative tokens by analyzing student entropy and teacher-student divergence, achieving significant memory reduction and performance gains. Another method, SimCT, addresses issues with different tokenizers by expanding the supervision space to include multi-token continuations, recovering lost signal and improving performance on reasoning and code generation tasks. Additionally, EffOPD accelerates OPD training by optimizing update trajectories and module allocation, leading to a threefold speedup. AI
IMPACT These research advancements offer more efficient and effective ways to train smaller language models, potentially reducing computational costs and improving performance on complex reasoning tasks.
RANK_REASON The cluster contains multiple academic papers detailing new methods and theoretical insights into on-policy distillation for large language models.
- AIME 2024
- AIME 2025
- DeepPlanning
- DeepSeek-R1-Distill-Llama-8B
- EffOPD
- Llama
- MATH-500
- Olmo-3-7B-Think
- On-Policy Distillation
- Qwen2.5
- Qwen3
- SimCT
AI-generated summary · Google Gemini · from 6 sources. How we write summaries →