PulseAugur
实时 18:00:44
English(EN) Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

新研究解决了混合专家模型中的多语言适应问题

两篇新研究论文探讨了混合专家(MoE)模型在多语言任务中的适应性。一篇论文分析了语言专业化如何在持续预训练期间在MoE模型中出现,发现最终层会发展出特定语言的路由,并提出了一种仅更新少量参数的高效适应策略。另一篇论文介绍了RA-MoE,一个用于对齐跨语言路由模式以提高非英语下游任务性能的微调框架,证明了其在标准微调方法上的持续收益。 AI

影响 这些研究为提高多语言MoE模型的性能和效率提供了新技术,有可能拓宽其在不同语言环境中的应用范围。

排序理由 该集群包含两篇学术论文,详细介绍了将MoE模型适应多语言任务的新颖方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

新研究解决了混合专家模型中的多语言适应问题

报道来源 [5]

  1. arXiv cs.CL TIER_1 English(EN) · Aditi Khandelwal, Marius Mosbach, Verna Dankers, Siva Reddy, Golnoosh Farnadi ·

    利用混合专家模型中的路由动态实现高效语言适应

    arXiv:2605.29714v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models are widely used to scale language models, yet their expert routing behavior and adaptation in a multilingual setting remain underexplored. In this work, we study multilingual routing dynamics during c…

  2. arXiv cs.CL TIER_1 English(EN) · Golnoosh Farnadi ·

    利用混合专家模型中的路由动态实现高效语言适应

    Mixture-of-Experts (MoE) models are widely used to scale language models, yet their expert routing behavior and adaptation in a multilingual setting remain underexplored. In this work, we study multilingual routing dynamics during continual pre-training of an English-centric MoE …

  3. arXiv cs.AI TIER_1 English(EN) · Guanzhi Deng, Kuan Wu, Haibo Wang, Shing Yin Wong, Sichun Luo, Linqi Song ·

    面向混合专家模型中多语言下游任务的路由对齐微调

    arXiv:2605.28306v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models have emerged as a dominant paradigm for efficient LLM scaling, yet adapting them to non-English downstream tasks remains challenging. Existing fine-tuning approaches treat MoE models as monolithic l…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    面向混合专家模型多语言下游任务的路由对齐微调

    Mixture-of-Experts (MoE) models have emerged as a dominant paradigm for efficient LLM scaling, yet adapting them to non-English downstream tasks remains challenging. Existing fine-tuning approaches treat MoE models as monolithic learners, ignoring the heterogeneous routing struct…

  5. arXiv cs.CL TIER_1 English(EN) · Linqi Song ·

    面向混合专家模型的路由对齐多语言下游任务微调

    Mixture-of-Experts (MoE) models have emerged as a dominant paradigm for efficient LLM scaling, yet adapting them to non-English downstream tasks remains challenging. Existing fine-tuning approaches treat MoE models as monolithic learners, ignoring the heterogeneous routing struct…