PulseAugur
EN
LIVE 15:31:37

New research explores efficient Mixture-of-Experts models

Researchers have proposed several novel approaches to enhance the efficiency and capabilities of Mixture-of-Experts (MoE) language models. One method, "Expert Tying," reduces memory footprint by sharing expert parameters across transformer layers with minimal impact on performance, evaluated on models like OLMoE, Qwen3, and DeepSeek. Another technique, "Mosaic," addresses data and model heterogeneity in federated learning by using data-free knowledge distillation via MoE to train a global model. Additionally, "Decoupled Mixture-of-Experts" (DMoE) offers a modular way to inject external knowledge into LLMs without catastrophic forgetting, and a framework called STEM-GNN uses tokenized MoEs to generalize graph neural networks more robustly. AI

IMPACT These research papers explore methods to improve the efficiency, robustness, and knowledge injection capabilities of Mixture-of-Experts models, potentially leading to more scalable and capable LLMs.

RANK_REASON Multiple arXiv papers introducing novel methods for Mixture-of-Experts models.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 6 sources. How we write summaries →

New research explores efficient Mixture-of-Experts models

COVERAGE [6]

  1. arXiv cs.AI TIER_1 English(EN) · Martin Jaggi ·

    Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

    arXiv:2606.16825v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held…

  2. arXiv cs.AI TIER_1 English(EN) · Junming Liu, Yanting Gao, Yuqi Li, Siyuan Meng, Yifei Sun, Aoqi Wu, Yirong Chen, Ding Wang, Shiping Wen ·

    Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

    arXiv:2505.19699v2 Announce Type: replace-cross Abstract: Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise t…

  3. arXiv cs.AI TIER_1 English(EN) · Martin Jaggi ·

    Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

    Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held in training and inference memory. To address this…

  4. arXiv cs.CL TIER_1 English(EN) · Baoqing Yue, Weihang Su, Qingyao Ai, Yichen Tang, Changyue Wang, Jiacheng Kang, Jingtao Zhan, Yiqun Liu ·

    Decoupled Mixture-of-Experts for Parametric Knowledge Injection

    arXiv:2606.14243v1 Announce Type: new Abstract: Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented gene…

  5. arXiv cs.LG TIER_1 English(EN) · Xiaoguang Guo, Zehong Wang, Jiazheng Li, Shawn Spitzel, Qi Yang, Kaize Ding, Jundong Li, Chuxu Zhang ·

    Generalizing GNNs with Tokenized Mixture of Experts

    arXiv:2602.09258v2 Announce Type: replace Abstract: Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: impr…

  6. arXiv cs.CL TIER_1 English(EN) · Yiqun Liu ·

    Decoupled Mixture-of-Experts for Parametric Knowledge Injection

    Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented generation keeps knowledge outside the model but onl…