Researchers have developed new methods to improve on-policy distillation (OPD), a technique for training smaller language models using larger ones. One approach, TIP, identifies informative tokens by analyzing student entropy and teacher-student divergence, achieving significant memory reduction and performance gains. Another method, SimCT, addresses issues with different tokenizers by expanding the supervision space to include multi-token continuations, recovering lost signal and improving performance on reasoning and code generation tasks. Additionally, EffOPD accelerates OPD training by optimizing update trajectories and module allocation, leading to a threefold speedup. AI
影响 These research advancements offer more efficient and effective ways to train smaller language models, potentially reducing computational costs and improving performance on complex reasoning tasks.
排序理由 The cluster contains multiple academic papers detailing new methods and theoretical insights into on-policy distillation for large language models.
- AIME 2024
- AIME 2025
- DeepPlanning
- DeepSeek-R1-Distill-Llama-8B
- EffOPD
- Llama
- MATH-500
- Olmo-3-7B-Think
- On-Policy Distillation
- Qwen2.5
- Qwen3
- SimCT
AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →