English(EN) MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

新的量化方法 MODE 削减 MoE-MLLM 内存成本

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-17 04:00

研究人员推出了一种新颖的量化框架 MODE，旨在降低混合专家多模态大语言模型 (MoE-MLLMs) 的显著内存成本。该框架解决了现有方法中阻碍性能的专家重要性估计偏差。通过按模态分解专家选择频率并过滤冗余视觉标记，MODE 旨在提高量化精度，尤其对文本关键专家而言。实验表明，MODE 实现了大幅压缩，即使在极端比特宽度设置下，性能损失也很小。 AI

影响降低了 MoE-MLLM 的内存占用，可能支持这些强大模型的更广泛部署和实验。

排序理由该集群包含一篇详细介绍 AI 模型新技术的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yuanteng Chen, Peisong Wang, Zhilei Liu, Nanxin Zeng, Yuantian Shao, Shiqiang Lang, Tao Liu, Chuangyi Li, Qinghao Hu, Gang Li, Jing Liu, Jian Cheng · 2026-06-17 04:00

MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

arXiv:2606.17118v1 Announce Type: cross Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE-MLLMs) offer remarkable performance but incur prohibitive GPU memory costs, making compression essential. Among PTQ methods, expert-level mixed-precision quantization has pr…

报道来源 [1]

MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

相关实体

相关话题