Video foundation models show emergent intuitive physics understanding

By PulseAugur Editorial · [2 sources] · 2026-06-08 15:40

A new research paper investigates whether video foundation models possess an understanding of intuitive physics. The study probes frozen representations of models like V-JEPA, VideoMAE, and LTX-Video using benchmarks such as IntPhys2 and Minimal Video Pairs. Results indicate that V-JEPA performs best, particularly with temporal dynamics probes, while VideoMAE is competitive, and LTX-Video shows weaker but present signals. The research also found that physics knowledge is more accessible in intermediate to late layers of these models. AI

IMPACT Reveals emergent physics understanding in video models, potentially improving their real-world interaction capabilities.

RANK_REASON Research paper analyzing model capabilities.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Video foundation models show emergent intuitive physics understanding

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Samuele Punzo, Niccol\`o Caselli, Ippokratis Pantelidis, Francesco Massafra, Salvatore Lo Sardo, Mohammadreza Salehi · 2026-06-09 04:00

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

arXiv:2606.09646v1 Announce Type: cross Abstract: We study whether pretrained video foundation models encode intuitive-physics information in their frozen representations, and how this information varies across model families, layers, and probe types. Using frozen-feature probing…
arXiv cs.AI TIER_1 English(EN) · Mohammadreza Salehi · 2026-06-08 15:40

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

We study whether pretrained video foundation models encode intuitive-physics information in their frozen representations, and how this information varies across model families, layers, and probe types. Using frozen-feature probing on IntPhys2 and Minimal Video Pairs (MVP), we com…

COVERAGE [2]

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

RELATED ENTITIES

RELATED TOPICS