PulseAugur
EN
LIVE 13:28:35

New benchmark YoCausal tests video models' causal understanding

Researchers have introduced YoCausal, a novel benchmark designed to assess the causal understanding of video diffusion models (VDMs). The benchmark, inspired by cognitive science principles, uses temporally reversed real-world videos to create natural counterfactual samples. YoCausal includes two levels: the Reverse Surprise Index (RSI) to measure temporal perception and the Causality Cognition Index (CCI) which uses a visual language model (VLM) to distinguish genuine causal reasoning from mere temporal pattern overfitting. Evaluations of 13 state-of-the-art VDMs indicate a significant gap between their ability to perceive the arrow of time and true causal cognition, falling short of human-level understanding. AI

IMPACT This benchmark may push video generation models to develop more robust causal reasoning capabilities beyond simple temporal pattern recognition.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New benchmark YoCausal tests video models' causal understanding

COVERAGE [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    YoCausal: How Far is Video Generation from World Model? A Causality Perspective

    Video diffusion models exhibit arrow-of-time perception without true causal understanding, as demonstrated by a novel benchmark measuring causal cognition through reverse surprise and visual language model analysis.

  2. arXiv cs.CV TIER_1 English(EN) · You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang, Yu-Lun Liu, Zhixiang Wang ·

    YoCausal: How Far is Video Generation from World Model? A Causality Perspective

    arXiv:2605.30346v1 Announce Type: new Abstract: As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting…

  3. arXiv cs.CV TIER_1 English(EN) · Zhixiang Wang ·

    YoCausal: How Far is Video Generation from World Model? A Causality Perspective

    As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting real-world generalization due to the sim-to-rea…