实体 Multi-Teacher On-Policy Distillation

Multi-Teacher On-Policy Distillation

PulseAugur coverage of Multi-Teacher On-Policy Distillation — every cluster mentioning Multi-Teacher On-Policy Distillation across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

90 天内 2

发布 · 30天

90 天内 0

论文 · 30天

90 天内 2

层级分布 · 90 天

主题

情绪 · 30 天

2 天有情绪数据

最近 · 第 1/1 页 · 共 2 条

RESEARCH · CL_93241 · Jun 12 · 00:00

Nemotron 3 Ultra：开源 LLM 拥有百万级上下文、6倍吞吐量

研究人员发布了 Nemotron 3 Ultra，这是一个拥有 5500 亿参数的语言模型，它采用了混合 Mamba-Transformer 架构和专家混合（Mixture-of-Experts）方法。该模型在 20 万亿个 token 上进行了训练，拥有百万级 token 的上下文长度，并采用了 LatentMoE 和 Multi Token Prediction 等先进技术。与当前最先进的模型相比，Nemotron 3 Ultra…
RESEARCH · CL_53546 · May 26 · 14:52

New distillation method recovers LLM general capabilities after domain specialization

Researchers have developed a new method called Counteraction-Aware Multi-Teacher On-Policy Distillation (CaMOPD) to address the challenge of recovering general capabilities in large language models (LLMs) after domain s…

Nemotron 3 Ultra：开源 LLM 拥有百万级上下文、6倍吞吐量

New distillation method recovers LLM general capabilities after domain specialization