A new benchmark called SpookyBench has been developed to test the temporal understanding of video-language models (VLMs). Researchers found that while humans can accurately identify patterns in purely temporal sequences, current state-of-the-art VLMs fail completely. This highlights a critical limitation in VLMs' over-reliance on spatial features and their inability to extract meaning from temporal cues, a problem that worsens with lower spatial signal-to-noise ratios. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights a critical limitation in current video-language models, potentially guiding future research towards better temporal reasoning.
RANK_REASON The cluster contains an academic paper introducing a new benchmark for evaluating video-language models.