SceneScribe-1M dataset offers one million videos for 3D perception and video synthesis

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced SceneScribe-1M, a new large-scale video dataset designed to bridge the gap between 3D geometric perception and video synthesis. The dataset contains one million in-the-wild videos, each annotated with textual descriptions, camera parameters, depth maps, and 3D point tracks. SceneScribe-1M aims to serve as a comprehensive benchmark for tasks like depth estimation and scene reconstruction, as well as generative tasks such as text-to-video synthesis. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a new benchmark dataset for advancing both 3D perception and video generation models.

RANK_REASON This is a research paper describing a new dataset.

Read on arXiv cs.CV →

paper
other

COVERAGE [1]

arXiv cs.CV TIER_1 · Yunnan Wang, Kecheng Zheng, Jianyuan Wang, Minghao Chen, David Novotny, Christian Rupprecht, Yinghao Xu, Xing Zhu, Wenjun Zeng, Xin Jin, Yujun Shen · 2026-04-28 04:00

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

arXiv:2604.07990v2 Announce Type: replace Abstract: The convergence of 3D geometric perception and video synthesis has created an unprecedented demand for large-scale video data that is rich in both semantic and spatio-temporal information. While existing datasets have advanced e…

COVERAGE [1]

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

RELATED ENTITIES

RELATED TOPICS