Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-01 17:54

研究人员引入了持久视觉记忆（PVM），这是一个旨在解决大型视觉语言模型（LVLMs）中“视觉信号稀释”问题的新型模块。该问题会导致随着生成文本序列的增长，视觉注意力减弱。PVM作为LVLM架构内的并行分支，为视觉嵌入提供了一条直接通路，以维持感知，尤其是在复杂的推理任务中。在Qwen3-VL模型上的实验表明，在参数增加极少的情况下，准确性得到了显著提高。 AI

影响解决了LVLM的一个关键限制，有望提高复杂多模态推理任务的性能。

排序理由这是一篇介绍LVLM新模块的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng · 2026-05-04 04:00

持久视觉记忆：在LVLMs中维持深度生成的感知

arXiv:2605.00814v1 Announce Type: new Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention parti…
arXiv cs.CV TIER_1 English(EN) · Yu Cheng · 2026-05-01 17:54

持久视觉记忆：在LVLMs中维持深度生成的感知

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay…

报道来源 [2]

持久视觉记忆：在LVLMs中维持深度生成的感知

持久视觉记忆：在LVLMs中维持深度生成的感知

相关实体

相关话题