Researchers have introduced Persistent Visual Memory (PVM), a novel module designed to address the "Visual Signal Dilution" problem in Large Vision-Language Models (LVLMs). This issue causes visual attention to weaken as the generated text sequence lengthens. PVM acts as a parallel branch within the LVLM architecture, providing a direct pathway for visual embeddings to maintain perception, especially in complex reasoning tasks. Experiments on Qwen3-VL models showed significant accuracy improvements with minimal added parameters. AI
影响 Addresses a key limitation in LVLMs, potentially improving performance on complex multimodal reasoning tasks.
排序理由 This is a research paper introducing a new module for LVLMs.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →