New benchmark YoCausal tests video models' causal understanding

By PulseAugur Editorial · [3 sources] · 2026-05-28 00:00

Researchers have introduced YoCausal, a novel benchmark designed to assess the causal understanding of video diffusion models (VDMs). The benchmark, inspired by cognitive science principles, uses temporally reversed real-world videos to create natural counterfactual samples. YoCausal includes two levels: the Reverse Surprise Index (RSI) to measure temporal perception and the Causality Cognition Index (CCI) which uses a visual language model (VLM) to distinguish genuine causal reasoning from mere temporal pattern overfitting. Evaluations of 13 state-of-the-art VDMs indicate a significant gap between their ability to perceive the arrow of time and true causal cognition, falling short of human-level understanding. AI

IMPACT This benchmark may push video generation models to develop more robust causal reasoning capabilities beyond simple temporal pattern recognition.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New benchmark YoCausal tests video models' causal understanding

COVERAGE [3]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

Video diffusion models exhibit arrow-of-time perception without true causal understanding, as demonstrated by a novel benchmark measuring causal cognition through reverse surprise and visual language model analysis.
arXiv cs.CV TIER_1 English(EN) · You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang, Yu-Lun Liu, Zhixiang Wang · 2026-05-29 04:00

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

arXiv:2605.30346v1 Announce Type: new Abstract: As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting…
arXiv cs.CV TIER_1 English(EN) · Zhixiang Wang · 2026-05-28 17:59

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting real-world generalization due to the sim-to-rea…

COVERAGE [3]

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

RELATED ENTITIES

RELATED TOPICS