新方法大幅削减VLM视觉Token，提升效率

作者 PulseAugur 编辑部 · [4 个来源] · 2026-06-01 12:24

研究人员开发了三种新方法，可显著压缩大型视觉语言模型（VLM）使用的视觉Token，旨在降低计算开销并提高推理速度。InfoMerge利用时间指纹差异和内容感知分配，ETC采用任务感知视觉信息蒸馏，EvoCut分析多层Token演化。这些方法在Token数量上实现了大幅削减，其中一些在保持超过98%的原始性能的同时实现了显著的加速。 AI

影响这些技术为VLM提供了显著的效率提升，有望加速涉及视觉理解的AI应用的部署并降低运营成本。

排序理由三篇不同的研究论文，提出了用于优化大型视觉语言模型的新颖方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.CL TIER_1 English(EN) · Xinxin Liu, Shiwei Gan, Xiao Liu, Yafeng Yin, Lei Xie, Sanglu Lu · 2026-06-02 04:00

InfoMerge：信息感知型令牌压缩，用于高效视频大语言模型

arXiv:2606.02161v1 Announce Type: cross Abstract: Video Large Language Models (Video-LLMs) achieve strong performance in video understanding, but their excessive visual tokens bring substantial computational overhead. Existing training-free compression methods improve inference e…
arXiv cs.CL TIER_1 English(EN) · Sanglu Lu · 2026-06-01 12:24

InfoMerge：面向高效视频大语言模型的感知信息的令牌压缩

Video Large Language Models (Video-LLMs) achieve strong performance in video understanding, but their excessive visual tokens bring substantial computational overhead. Existing training-free compression methods improve inference efficiency by reducing visual tokens, yet they ofte…
arXiv cs.CV TIER_1 English(EN) · Yiling Gao, Hongchen Wei, Zhenzhong Chen · 2026-06-02 04:00

ETC：通过任务感知视觉信息蒸馏在视觉语言模型中实现极端令牌压缩

arXiv:2606.00543v1 Announce Type: new Abstract: In Vision-Language Models (VLMs), high-resolution images produce a large number of visual tokens, resulting in high computational costs and KV-cache overhead during inference. To address this problem, we propose an Extreme Token Com…
arXiv cs.CV TIER_1 English(EN) · Hongyu Lu, Feng Zhang, Wenwei Jin, Huanling Hu, Pengfei Zhang, Yao Hu, Jiawei Li, Shikai Jiang · 2026-06-02 04:00

EvoCut：多层感知演化视觉令牌压缩，助力高效大型视觉语言模型

arXiv:2606.01756v1 Announce Type: new Abstract: Large vision-language models (LVLMs) achieve strong performance on image and video understanding tasks, but their inference efficiency is constrained by the large number of visual tokens produced by vision encoders. Most existing vi…

报道来源 [4]

InfoMerge：信息感知型令牌压缩，用于高效视频大语言模型

InfoMerge：面向高效视频大语言模型的感知信息的令牌压缩

ETC：通过任务感知视觉信息蒸馏在视觉语言模型中实现极端令牌压缩

EvoCut：多层感知演化视觉令牌压缩，助力高效大型视觉语言模型

相关实体

相关话题