ENTITY On-policy self-distillation

On-policy self-distillation

PulseAugur coverage of On-policy self-distillation — every cluster mentioning On-policy self-distillation across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

9 over 90d

Releases · 30d

0 over 90d

Papers · 30d

9 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 9 TOTAL

RESEARCH · CL_160792 · Jul 23 · 00:00

Visual Contrastive Self-Distillation Improves Qwen VL Models

Researchers have developed Visual Contrastive Self-Distillation (VCSD), a novel method for improving Vision-Language Models (VLMs) without requiring external teachers or privileged information. VCSD works by comparing a…
TOOL · CL_156456 · Jul 22 · 04:00

New distillation method trains LLMs efficiently with soft prompts

Researchers have developed a new method called Multi-Task On-Policy Distillation via Soft-Prompt Privileged Context ("method") to train large language models. This technique uses a teacher model that differs from the st…
TOOL · CL_154090 · Jul 21 · 04:00

New research identifies decoding collapse in AI agent self-distillation

Researchers have identified a failure mode in feedback-augmented self-distillation for retrieval-interleaved search agents, termed decoding collapse. This occurs when models generate diverse-looking but input-agnostic r…
RESEARCH · CL_156464 · Jul 21 · 00:00

New H$^2$SD framework boosts LLM reasoning via hybrid self-distillation

Researchers have developed H$^2$SD, a novel hybrid hindsight self-distillation framework designed to enhance the reasoning abilities of large language models. This method addresses limitations in existing reinforcement …
RESEARCH · CL_141172 · Jul 12 · 15:24

New research tackles pathologies in On-Policy Distillation for LLMs

Researchers have identified and proposed solutions for two key pathologies in On-Policy Distillation (OPD), a technique used in large language model post-training. The first pathology, Student-Teacher Mismatch, occurs w…
RESEARCH · CL_117125 · Jun 23 · 00:00

New research challenges on-policy self-distillation for LLMs, proposing refined methods · 10 sources tracked

Recent research papers explore the limitations and potential improvements of on-policy self-distillation (OPSD) for training large language models (LLMs). Studies indicate that standard OPSD can lead to rote memorizatio…
RESEARCH · CL_90827 · Jun 12 · 00:00

New methods enhance VLM accuracy for GUI grounding tasks · 2 papers

Two new research papers introduce novel methods for improving the accuracy and reliability of vision-language models (VLMs) in GUI grounding tasks. The first paper, "Trust the Right Teacher," proposes quality-aware self…
RESEARCH · CL_79119 · Jun 7 · 00:00

New Trajectory-Refined Distillation improves LLM training

Researchers have introduced Trajectory-Refined Distillation (TRD), a new method to improve the post-training process for large language models. TRD addresses a problem called "prefix failure" in on-policy distillation, …
TOOL · CL_68337 · Jun 3 · 04:00

New distillation method enhances AI safety without sacrificing reasoning

Researchers have developed a new method called Constitutional On-Policy Safe Distillation (COPSD) to improve the safety and helpfulness of AI models. Existing on-policy self-distillation techniques can lead to a collaps…

Visual Contrastive Self-Distillation Improves Qwen VL Models

New distillation method trains LLMs efficiently with soft prompts

New research identifies decoding collapse in AI agent self-distillation

New H$^2$SD framework boosts LLM reasoning via hybrid self-distillation

New research tackles pathologies in On-Policy Distillation for LLMs

New research challenges on-policy self-distillation for LLMs, proposing refined methods · 10 sources tracked

New methods enhance VLM accuracy for GUI grounding tasks · 2 papers

New Trajectory-Refined Distillation improves LLM training

New distillation method enhances AI safety without sacrificing reasoning