PulseAugur
实时 09:24:42
English(EN) Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

研究发现视频大语言模型存在运动方向感知障碍

研究人员发现当前视频大语言模型(Video-LLMs)存在一个显著的局限性,称为“运动方向感知障碍”,即模型难以准确感知和表述物体运动的方向。尽管运动方向信息存在于模型的内部状态中,但一个“方向绑定缺口”阻止了其与语言输出的正确关联。为解决此问题,研究人员开发了MoDirect数据集用于微调和评估,以及一种新颖的目标函数DeltaDirect,该函数在合成基准测试中将运动方向准确率从接近随机水平提高到85%以上,在真实世界数据上提高了21.9个百分点。 AI

影响 识别出视频大语言模型中一个关键的感知缺陷,可能影响其在需要细粒度时间理解的任务中的可靠性。

排序理由 学术论文,详细介绍了一种诊断方法和针对视频大语言模型特定故障模式的解决方案。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Jongseo Lee, Hyuntak Lee, Sunghun Kim, Sooa Kim, Jihoon Chung, Jinwoo Choi ·

    Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

    arXiv:2605.22823v1 Announce Type: new Abstract: Video Large Language Models (Video-LLMs) have made rapid progress on temporal video understanding, yet many fail at a basic perceptual primitive: signed image-plane motion direction. On simple videos of a single object moving left, …

  2. arXiv cs.CV TIER_1 English(EN) · Jinwoo Choi ·

    Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

    Video Large Language Models (Video-LLMs) have made rapid progress on temporal video understanding, yet many fail at a basic perceptual primitive: signed image-plane motion direction. On simple videos of a single object moving left, right, up, or down, most Video-LLMs perform near…