SEATS method slashes LLM compute by pruning audio-visual tokens

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-19 15:55

Researchers have developed SEATS, a new method to make omni-modal large language models (om-LLMs) more efficient. SEATS prunes redundant audio-visual tokens throughout the model's layers, adapting the token selection process based on cross-modal fusion. This approach significantly reduces computational load and speeds up inference while maintaining high performance. AI

影响 Reduces computational overhead and speeds up inference for multi-modal LLMs, potentially lowering deployment costs.

排序理由 The cluster contains a research paper detailing a new method for improving LLM efficiency.

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 15:55

面向高效全模态大模型的阶段自适应Token选择

Omni-modal large language models (om-LLMs) achieve unified audio-visual understanding by encoding video and audio into temporally aligned token sequences interleaved at the window level. However, processing these dense non-textual tokens throughout the LLM incurs substantial comp…
arXiv cs.CV TIER_1 English(EN) · Xirong Li · 2026-05-19 15:55

面向高效全模态大模型的阶段自适应Token选择

Omni-modal large language models (om-LLMs) achieve unified audio-visual understanding by encoding video and audio into temporally aligned token sequences interleaved at the window level. However, processing these dense non-textual tokens throughout the LLM incurs substantial comp…

报道来源 [2]

面向高效全模态大模型的阶段自适应Token选择

面向高效全模态大模型的阶段自适应Token选择

相关实体

相关话题