PulseAugur
实时 15:31:31
English(EN) Decoupled Mixture-of-Experts for Parametric Knowledge Injection

新研究探索高效的专家混合模型

研究人员提出了几种新颖的方法来增强专家混合(MoE)语言模型的效率和能力。一种名为“专家绑定”(Expert Tying)的方法通过在Transformer层之间共享专家参数来减少内存占用,同时对性能影响最小,该方法在OLMoE、Qwen3和DeepSeek等模型上进行了评估。另一种技术“Mosaic”通过使用无数据知识蒸馏(via MoE)来训练全局模型,解决了联邦学习中的数据和模型异构性问题。此外,“解耦专家混合”(Decoupled Mixture-of-Experts, DMoE)提供了一种模块化的方式,可以在不发生灾难性遗忘的情况下将外部知识注入大型语言模型(LLMs),一个名为STEM-GNN的框架使用标记化的MoE来更稳健地泛化图神经网络。 AI

影响 这些研究论文探索了提高专家混合模型效率、鲁棒性和知识注入能力的方法,有可能带来更具可扩展性和更强大的大型语言模型(LLMs)。

排序理由 多篇arXiv论文介绍了专家混合模型的新颖方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

新研究探索高效的专家混合模型

报道来源 [6]

  1. arXiv cs.AI TIER_1 English(EN) · Martin Jaggi ·

    闭环 -- 混合专家语言模型中的绑定专家层

    arXiv:2606.16825v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held…

  2. arXiv cs.AI TIER_1 English(EN) · Junming Liu, Yanting Gao, Yuqi Li, Siyuan Meng, Yifei Sun, Aoqi Wu, Yirong Chen, Ding Wang, Shiping Wen ·

    Mosaic:面向异构分布式环境的基于混合专家的无数据知识蒸馏

    arXiv:2505.19699v2 Announce Type: replace-cross Abstract: Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise t…

  3. arXiv cs.AI TIER_1 English(EN) · Martin Jaggi ·

    闭环 -- 混合专家语言模型中的绑定专家层

    Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held in training and inference memory. To address this…

  4. arXiv cs.CL TIER_1 English(EN) · Baoqing Yue, Weihang Su, Qingyao Ai, Yichen Tang, Changyue Wang, Jiacheng Kang, Jingtao Zhan, Yiqun Liu ·

    解耦专家混合模型用于参数化知识注入

    arXiv:2606.14243v1 Announce Type: new Abstract: Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented gene…

  5. arXiv cs.LG TIER_1 English(EN) · Xiaoguang Guo, Zehong Wang, Jiazheng Li, Shawn Spitzel, Qi Yang, Kaize Ding, Jundong Li, Chuxu Zhang ·

    用分块专家混合模型泛化GNN

    arXiv:2602.09258v2 Announce Type: replace Abstract: Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: impr…

  6. arXiv cs.CL TIER_1 English(EN) · Yiqun Liu ·

    解耦专家混合模型用于参数化知识注入

    Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented generation keeps knowledge outside the model but onl…