Researchers have developed new self-distillation techniques for large language models to improve their performance without relying on external feedback. AVSD (Adaptive-View Self-Distillation) balances consensus signals across multiple privileged information views with view-specific residuals to enhance learning. Self-Policy Distillation (SPD) extracts a capability subspace from gradients to improve performance and generalizability, particularly in code generation and mathematical reasoning. CEPO (Contrastive Evidence Policy Optimization) sharpens credit assignment at decisive tokens by contrasting correct answers with incorrect ones, improving accuracy on multimodal mathematical reasoning benchmarks. AI
影响 These self-distillation techniques offer improved performance and generalizability for LLMs in complex reasoning tasks without external supervision.
排序理由 The cluster contains multiple research papers detailing novel methods for self-distillation in large language models.
- CEPO
- Contrastive Evidence Policy Optimization
- GRPO
- RLVR
- code-generation benchmarks
- language models
- math competition benchmarks
- Qwen3-4B
- Qwen3-8B
- self-distillation
- AIME24
- AIME25
- Codeforces
- HMMT25
- LiveCodeBench v6
- Self-Policy Distillation
AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →