English(EN) Decoupled Mixture-of-Experts for Parametric Knowledge Injection

新研究探索高效的专家混合模型

作者 PulseAugur 编辑部 · [6 个来源] · 2026-06-12 08:21

研究人员提出了几种新颖的方法来增强专家混合（MoE）语言模型的效率和能力。一种名为“专家绑定”（Expert Tying）的方法通过在Transformer层之间共享专家参数来减少内存占用，同时对性能影响最小，该方法在OLMoE、Qwen3和DeepSeek等模型上进行了评估。另一种技术“Mosaic”通过使用无数据知识蒸馏（via MoE）来训练全局模型，解决了联邦学习中的数据和模型异构性问题。此外，“解耦专家混合”（Decoupled Mixture-of-Experts, DMoE）提供了一种模块化的方式，可以在不发生灾难性遗忘的情况下将外部知识注入大型语言模型（LLMs），一个名为STEM-GNN的框架使用标记化的MoE来更稳健地泛化图神经网络。 AI

影响这些研究论文探索了提高专家混合模型效率、鲁棒性和知识注入能力的方法，有可能带来更具可扩展性和更强大的大型语言模型（LLMs）。

排序理由多篇arXiv论文介绍了专家混合模型的新颖方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。我们如何撰写摘要 →

报道来源 [6]

arXiv cs.AI TIER_1 English(EN) · Martin Jaggi · 2026-06-16 04:00

闭环 -- 混合专家语言模型中的绑定专家层

arXiv:2606.16825v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held…
arXiv cs.AI TIER_1 English(EN) · Junming Liu, Yanting Gao, Yuqi Li, Siyuan Meng, Yifei Sun, Aoqi Wu, Yirong Chen, Ding Wang, Shiping Wen · 2026-06-16 04:00

Mosaic：面向异构分布式环境的基于混合专家的无数据知识蒸馏

arXiv:2505.19699v2 Announce Type: replace-cross Abstract: Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise t…
arXiv cs.AI TIER_1 English(EN) · Martin Jaggi · 2026-06-15 15:08

闭环 -- 混合专家语言模型中的绑定专家层

Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held in training and inference memory. To address this…
arXiv cs.CL TIER_1 English(EN) · Baoqing Yue, Weihang Su, Qingyao Ai, Yichen Tang, Changyue Wang, Jiacheng Kang, Jingtao Zhan, Yiqun Liu · 2026-06-15 04:00

解耦专家混合模型用于参数化知识注入

arXiv:2606.14243v1 Announce Type: new Abstract: Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented gene…
arXiv cs.LG TIER_1 English(EN) · Xiaoguang Guo, Zehong Wang, Jiazheng Li, Shawn Spitzel, Qi Yang, Kaize Ding, Jundong Li, Chuxu Zhang · 2026-06-15 04:00

用分块专家混合模型泛化GNN

arXiv:2602.09258v2 Announce Type: replace Abstract: Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: impr…
arXiv cs.CL TIER_1 English(EN) · Yiqun Liu · 2026-06-12 08:21

解耦专家混合模型用于参数化知识注入

Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented generation keeps knowledge outside the model but onl…

报道来源 [6]

闭环 -- 混合专家语言模型中的绑定专家层

Mosaic：面向异构分布式环境的基于混合专家的无数据知识蒸馏

闭环 -- 混合专家语言模型中的绑定专家层

解耦专家混合模型用于参数化知识注入

用分块专家混合模型泛化GNN

解耦专家混合模型用于参数化知识注入

相关实体

相关话题