Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV English(EN) · 4d

Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning

Researchers have developed a new framework called Counterfactual Relational Policy Optimization (CRPO) to improve the spatiotemporal sensitivity of video large language models (Video LLMs). This method addresses the issue of Video LLMs relying on shortcuts rather than accurately tracking video dynamics. CRPO uses a dual-branch reinforcement learning approach with a novel Counterfactual Relation Reward (CRR) to encourage models to change their answers when the visual context is altered, thus preventing reliance on static cues. AI

IMPACT This research could lead to more robust video understanding models that truly grasp temporal dynamics, improving applications in video analysis and content understanding.
RESEARCH · arXiv cs.CV English(EN) · 4d · [2 sources]

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

Researchers have identified a significant limitation in current Video Large Language Models (Video-LLMs), termed "directional motion blindness," where models struggle to accurately perceive and articulate the direction of object movement. Despite motion direction information being present in the model's internal states, a "direction binding gap" prevents it from being correctly associated with verbal outputs. To address this, the researchers developed MoDirect, a dataset for tuning and evaluation, and DeltaDirect, a novel objective function that significantly improves motion direction accuracy from near chance to over 85% on synthetic benchmarks and by 21.9 points on real-world data. AI

IMPACT Identifies a critical perceptual flaw in Video-LLMs, potentially impacting their reliability for tasks requiring fine-grained temporal understanding.

Brief

Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs