Researchers have introduced YoCausal, a novel benchmark designed to assess the causal understanding of video diffusion models (VDMs). The benchmark, inspired by cognitive science principles, uses temporally reversed real-world videos to create natural counterfactual samples. YoCausal includes two levels: the Reverse Surprise Index (RSI) to measure temporal perception and the Causality Cognition Index (CCI) which uses a visual language model (VLM) to distinguish genuine causal reasoning from mere temporal pattern overfitting. Evaluations of 13 state-of-the-art VDMs indicate a significant gap between their ability to perceive the arrow of time and true causal cognition, falling short of human-level understanding. AI
IMPACT This benchmark may push video generation models to develop more robust causal reasoning capabilities beyond simple temporal pattern recognition.
RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models.
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →