English(EN) OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas 通过全景重投影简化了视觉语言模型（VLMs）的三维场景理解

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-17 16:29

研究人员开发了 OneCanvas，一种用于视觉语言模型（VLMs）的三维场景理解的新方法。OneCanvas 不使用复杂的几何编码器或大量的训练，而是将图像块特征投影到单个全景画布上，并通过三维位置嵌入保留深度信息。这种方法允许预训练的 VLMs 将全景表示作为标准图像进行处理，从而实现从任何视角的定位推理，并支持空间预训练课程。OneCanvas 在 SQA3D 和 VSI-Bench 等基准测试中取得了最先进的成果，同时所需的训练计算量大大减少。 AI

影响简化了视觉语言模型（VLMs）的三维场景理解，有望降低训练成本，并在机器人和具身AI领域实现新的应用。

排序理由该集群包含一篇详细介绍视觉语言模型（VLMs）三维场景理解新方法的 ist 研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Bart{\l}omiej Baranowski, Dave Zhenyu Chen, Matthias Nie{\ss}ner · 2026-06-18 04:00

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

arXiv:2606.19253v1 Announce Type: cross Abstract: Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Instead, OneCanvas aggregates patch…
arXiv cs.AI TIER_1 English(EN) · Matthias Nießner · 2026-06-17 16:29

OneCanvas：通过全景重投影实现三维场景理解

Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Instead, OneCanvas aggregates patch features from all views onto a single equirectang…

报道来源 [2]

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas：通过全景重投影实现三维场景理解

相关实体

相关话题