On-Policy Distillation
PulseAugur coverage of On-Policy Distillation — every cluster mentioning On-Policy Distillation across labs, papers, and developer communities, ranked by signal.
8 day(s) with sentiment data
-
V-Zero framework enables label-free visual reasoning, boosting training speed
Researchers have introduced V-Zero, a novel framework for fine-grained visual reasoning that operates without requiring annotated answer labels. This method utilizes contrastive evidence gating to enhance the model's ab…
-
New framework unifies image generation capabilities; research tackles distillation challenges
Researchers have introduced DanceOPD, a novel on-policy generative field distillation framework designed to unify diverse image generation capabilities like text-to-image, local editing, and global editing within a sing…
-
New RL framework uses language for adaptive guidance; survey covers LLM distillation techniques · 2 sources tracked
Researchers have introduced Hierarchical Reinforcement Learning with Language Instructions (HRLLI), a novel framework that enhances reinforcement learning efficiency by dynamically selecting relevant natural language gu…
-
New SAGE-OPD framework enhances multi-turn LLM agent training
Researchers have developed SAGE-OPD, a novel framework for multi-turn on-policy distillation (OPD) designed to improve the training of language model agents. Unlike previous methods that focused on single-turn settings,…
-
On-Policy Distillation Updates Found to Be Sparse and Geometrically Distinct
A new research paper explores the mechanics of on-policy distillation (OPD), a post-training technique that combines on-policy student trajectories with dense teacher supervision. The study reveals that OPD updates are …
-
New Distillation Method Boosts Math Reasoning in AI Models
Researchers have developed Sign-Gated On-Policy Distillation (SG-OPD), an advancement in on-policy distillation techniques. This new method incorporates a binary verifier to filter teacher signals, leading to improved p…
-
New Trajectory-Refined Distillation improves LLM training
Researchers have introduced Trajectory-Refined Distillation (TRD), a new method to improve the post-training process for large language models. TRD addresses a problem called "prefix failure" in on-policy distillation, …
-
New methods boost AI training efficiency for long-horizon reasoning
Researchers have developed new methods to make on-policy distillation (OPD) more efficient for training AI models on long-horizon reasoning tasks. Standard OPD requires full rollouts, which are computationally expensive…
-
New method boosts LLM inference speed with on-policy distillation
Researchers have developed Draft-OPD, a new method to improve the efficiency of speculative decoding in large language models. This technique addresses the mismatch between offline training and real-time inference by us…
-
New methods improve AI model training via selective feedback
Researchers have introduced new methods for on-policy distillation (OPD), a technique used to train student AI models using feedback from a stronger teacher model. Two papers propose focusing supervision on specific, "t…
-
Trust Region On-Policy Distillation
Researchers are exploring advanced techniques in on-policy distillation (OPD) for large language models to improve training stability and efficiency. Several papers introduce methods to refine how teacher models guide s…
-
New methods enhance on-policy distillation for LLM training
Researchers have developed new methods to improve on-policy distillation (OPD), a technique for training smaller language models using larger ones. One approach, TIP, identifies informative tokens by analyzing student e…
-
ProteinOPD framework enhances protein design alignment with 8x speedup
Researchers have developed ProteinOPD, a new framework for aligning protein language models (PLMs) with desired functions. This method adapts pretrained PLMs into specialized teachers and distills their knowledge into a…
-
New methods enhance on-policy distillation for LLMs
Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…
-
Researchers refine on-policy distillation for more stable LLM training
Researchers have identified significant empirical failure modes in on-policy distillation (OPD), a technique used for post-training large language models. The standard implementation, which relies on sampled-token log-r…