New self-distillation methods boost LLM performance on reasoning tasks

作者 PulseAugur 编辑部 · [5 sources] · 2026-05-19 06:46

Researchers have developed new self-distillation techniques for large language models to improve their performance without relying on external feedback. AVSD (Adaptive-View Self-Distillation) balances consensus signals across multiple privileged information views with view-specific residuals to enhance learning. Self-Policy Distillation (SPD) extracts a capability subspace from gradients to improve performance and generalizability, particularly in code generation and mathematical reasoning. CEPO (Contrastive Evidence Policy Optimization) sharpens credit assignment at decisive tokens by contrasting correct answers with incorrect ones, improving accuracy on multimodal mathematical reasoning benchmarks. AI

影响 These self-distillation techniques offer improved performance and generalizability for LLMs in complex reasoning tasks without external supervision.

排序理由 The cluster contains multiple research papers detailing novel methods for self-distillation in large language models.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.AI TIER_1 · Duy Nguyen, Hanqi Xiao, Archiki Prasad, Zaid Khan, Anirban Das, Austin Zhang, Sambit Sahu, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal · 2026-05-22 04:00

AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

arXiv:2605.20643v1 Announce Type: cross Abstract: Self-distillation enables language models to learn on-policy from their own trajectories by using the same model as both student and teacher, with the teacher being conditioned on privileged information unavailable to the student.…
arXiv cs.CL TIER_1 · Guangya Hao, Yitong Shang, Yunbo Long, Zhuokai Zhao, Hanxue Liang · 2026-05-22 04:00

Self-Policy Distillation via Capability-Selective Subspace Projection

arXiv:2605.22675v1 Announce Type: new Abstract: Self-distillation bootstraps large language models (LLMs) by training on their own generations. However, existing methods either rely on external signals to curate self-generated outputs (e.g., correctness filtering, execution feedb…
arXiv cs.CL TIER_1 · Hanxue Liang · 2026-05-21 16:18

Self-Policy Distillation via Capability-Selective Subspace Projection

Self-distillation bootstraps large language models (LLMs) by training on their own generations. However, existing methods either rely on external signals to curate self-generated outputs (e.g., correctness filtering, execution feedback, and reward search), which are costly and un…
arXiv cs.AI TIER_1 · Mohit Bansal · 2026-05-20 03:06

AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

Self-distillation enables language models to learn on-policy from their own trajectories by using the same model as both student and teacher, with the teacher being conditioned on privileged information unavailable to the student. Such information can come in different types or v…
arXiv cs.CL TIER_1 · Salman Khan · 2026-05-19 06:46

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

When a model produces a correct solution under reinforcement learning with verifiable rewards (RLVR), every token receives the same reward signal regardless of whether it was a decisive reasoning step or a grammatical filler. A natural fix is to condition the model on the correct…

报道来源 [5]

AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

Self-Policy Distillation via Capability-Selective Subspace Projection

Self-Policy Distillation via Capability-Selective Subspace Projection

AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

相关实体

相关话题