PulseAugur
EN
LIVE 11:56:08

Orca foundation model learns unified world latent space via Next-State-Prediction · 3 sources tracked

Researchers have introduced Orca, a novel general world foundation model designed to learn a unified latent space from multimodal data. Unlike models focused on single-modality prediction, Orca employs a Next-State-Prediction approach to understand and predict world dynamics. It utilizes both unconscious learning from continuous video and conscious learning from language-described events and VQA supervision, trained on a large dataset of 125K hours of video and 160M event annotations. The model demonstrates strong performance on downstream tasks such as text generation, image prediction, and embodied action generation, outperforming specialized baselines. AI

IMPACT Orca's unified world latent space approach could advance multimodal AI understanding and prediction capabilities.

RANK_REASON The cluster describes a new research paper detailing a novel foundation model.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

Orca foundation model learns unified world latent space via Next-State-Prediction · 3 sources tracked

COVERAGE [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Orca: The World is in Your Mind

    Orca establishes a unified world latent space through next-state-prediction modeling using multimodal data and demonstrates superior performance in downstream tasks compared to specialized baselines.

  2. arXiv cs.CV TIER_1 English(EN) · Yihao Wang, Yuheng Ji, Mingyu Cao, Yanqing Shen, Runze Xiao, Huaihai Lyu, Senwei Xie, Euan Liu, Klara Tian, Tianfeng Long, Yichi Zhang, Zhengliang Cai, Ruike Chen, Jifan Zhao, Ruochuan Shi, Zihan Tang, Jing Lyu, Wenxing Tan, Ningbo Zhang, Yangtao Hu, Yum… ·

    Orca: The World is in Your Mind

    arXiv:2606.30534v1 Announce Type: new Abstract: We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing iso…

  3. arXiv cs.CV TIER_1 English(EN) · Pengwei Wang ·

    Orca: The World is in Your Mind

    We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing isolated next-token, next-frame, or next-action pre…