Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 16h · [2 sources]

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

Researchers have developed OneCanvas, a novel approach to 3D scene understanding for vision-language models (VLMs). Instead of complex geometry encoders or extensive training, OneCanvas projects patch features onto a single panoramic canvas, preserving depth information with 3D position embeddings. This method allows pretrained VLMs to process the panoramic representation as a standard image, enabling situated reasoning from any viewpoint and supporting a spatial pretraining curriculum. OneCanvas achieves state-of-the-art results on benchmarks like SQA3D and VSI-Bench while requiring significantly less training compute. AI

IMPACT Simplifies 3D scene understanding for VLMs, potentially reducing training costs and enabling new applications in robotics and embodied AI.

arXiv
vision-language model
VSI-Bench
SQA3D
OneCanvas
Bartłomiej Baranowski