PulseAugur
实时 06:17:55

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Researchers have introduced Persistent Visual Memory (PVM), a novel module designed to address the "Visual Signal Dilution" problem in Large Vision-Language Models (LVLMs). This issue causes visual attention to weaken as the generated text sequence lengthens. PVM acts as a parallel branch within the LVLM architecture, providing a direct pathway for visual embeddings to maintain perception, especially in complex reasoning tasks. Experiments on Qwen3-VL models showed significant accuracy improvements with minimal added parameters. AI

影响 Addresses a key limitation in LVLMs, potentially improving performance on complex multimodal reasoning tasks.

排序理由 This is a research paper introducing a new module for LVLMs.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng ·

    Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

    arXiv:2605.00814v1 Announce Type: new Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention parti…

  2. arXiv cs.CV TIER_1 English(EN) · Yu Cheng ·

    Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

    While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay…