ENTITY
Multi-Teacher On-Policy Distillation
Multi-Teacher On-Policy Distillation
PulseAugur coverage of Multi-Teacher On-Policy Distillation — every cluster mentioning Multi-Teacher On-Policy Distillation across labs, papers, and developer communities, ranked by signal.
Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D
2 day(s) with sentiment data
RECENT · PAGE 1/1 · 2 TOTAL
-
Nemotron 3 Ultra: Open-Source LLM Boasts 1M Context, 6x Throughput
Researchers have introduced Nemotron 3 Ultra, a 550 billion parameter language model that utilizes a hybrid Mamba-Transformer architecture with a Mixture-of-Experts approach. The model was trained on 20 trillion tokens …
-
New distillation method recovers LLM general capabilities after domain specialization
Researchers have developed a new method called Counteraction-Aware Multi-Teacher On-Policy Distillation (CaMOPD) to address the challenge of recovering general capabilities in large language models (LLMs) after domain s…