ENTITY
On-policy self-distillation
On-policy self-distillation
PulseAugur coverage of On-policy self-distillation — every cluster mentioning On-policy self-distillation across labs, papers, and developer communities, ranked by signal.
Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D
2 day(s) with sentiment data
RECENT · PAGE 1/1 · 2 TOTAL
-
New Trajectory-Refined Distillation improves LLM training
Researchers have introduced Trajectory-Refined Distillation (TRD), a new method to improve the post-training process for large language models. TRD addresses a problem called "prefix failure" in on-policy distillation, …
-
New distillation method enhances AI safety without sacrificing reasoning
Researchers have developed a new method called Constitutional On-Policy Safe Distillation (COPSD) to improve the safety and helpfulness of AI models. Existing on-policy self-distillation techniques can lead to a collaps…