English(EN) Stage-adaptive Token Selection for Efficient Omni-modal LLMs

SEATS 方法通过修剪音视频 Token 削减大语言模型计算量

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-19 15:55

研究人员开发了一种名为 SEATS 的新方法，以提高全模态大语言模型（om-LLMs）的效率。SEATS 在模型的各个层中修剪冗余的音视频 Token，并根据跨模态融合自适应地调整 Token 选择过程。这种方法在保持高性能的同时，显著降低了计算负荷并加快了推理速度。 AI

影响降低了多模态大语言模型的计算开销并加快了推理速度，可能降低部署成本。

排序理由该集群包含一篇详细介绍提高大语言模型效率新方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 15:55

面向高效全模态大模型的阶段自适应Token选择

Omni-modal large language models (om-LLMs) achieve unified audio-visual understanding by encoding video and audio into temporally aligned token sequences interleaved at the window level. However, processing these dense non-textual tokens throughout the LLM incurs substantial comp…
arXiv cs.CV TIER_1 English(EN) · Xirong Li · 2026-05-19 15:55

面向高效全模态大模型的阶段自适应Token选择

Omni-modal large language models (om-LLMs) achieve unified audio-visual understanding by encoding video and audio into temporally aligned token sequences interleaved at the window level. However, processing these dense non-textual tokens throughout the LLM incurs substantial comp…