OneCanvas: 3D Scene Understanding via Panoramic Reprojection
Researchers have developed OneCanvas, a novel approach to 3D scene understanding for vision-language models (VLMs). Instead of complex geometry encoders or extensive training, OneCanvas projects patch features onto a single panoramic canvas, preserving depth information with 3D position embeddings. This method allows pretrained VLMs to process the panoramic representation as a standard image, enabling situated reasoning from any viewpoint and supporting a spatial pretraining curriculum. OneCanvas achieves state-of-the-art results on benchmarks like SQA3D and VSI-Bench while requiring significantly less training compute. AI
IMPACT Simplifies 3D scene understanding for VLMs, potentially reducing training costs and enabling new applications in robotics and embodied AI.