Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

Researchers have identified a significant limitation in current Video Large Language Models (Video-LLMs), termed "directional motion blindness," where models struggle to accurately perceive and articulate the direction of object movement. Despite motion direction information being present in the model's internal states, a "direction binding gap" prevents it from being correctly associated with verbal outputs. To address this, the researchers developed MoDirect, a dataset for tuning and evaluation, and DeltaDirect, a novel objective function that significantly improves motion direction accuracy from near chance to over 85% on synthetic benchmarks and by 21.9 points on real-world data. AI

IMPACT Identifies a critical perceptual flaw in Video-LLMs, potentially impacting their reliability for tasks requiring fine-grained temporal understanding.

Video-LLMs
MoDirect
DeltaDirect