English(EN) PROSE: Training-Free Egocentric Scene Registration with Vision-Language Models

PROSE方法使用视觉语言模型进行自中心场景注册

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-15 11:11

研究人员开发了PROSE，一种无需训练或深度传感器即可注册自中心RGB序列的新颖方法。PROSE利用预训练的视觉语言模型创建对象级别的3D场景图，并在不同捕获之间匹配对象实例。与现有的几何和学习场景图方法相比，该方法在Aria数字孪生和Aria日常活动基准测试中表现出优越的性能。 AI

影响通过改进自中心场景注册，该方法可以为机器人和AR系统实现更强大的空间记忆。

排序理由该集群包含一篇研究论文，详细介绍了一种使用视觉语言模型进行场景注册的新方法。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Zhiang Chen, Nahyuk Lee, Boyang Sun, Taein Kwon, Marc Pollefeys, Zuria Bauer, Sunghwan Hong · 2026-06-16 04:00

PROSE: Training-Free Egocentric Scene Registration with Vision-Language Models

arXiv:2606.16569v1 Announce Type: new Abstract: Registering two captures of the same indoor space taken at different times underpins persistent spatial memory for robots and AR systems, yet the realistic version of this task is egocentric and its most scalable form is RGB-only. H…
arXiv cs.CV TIER_1 English(EN) · Sunghwan Hong · 2026-06-15 11:11

PROSE: Training-Free Egocentric Scene Registration with Vision-Language Models

Registering two captures of the same indoor space taken at different times underpins persistent spatial memory for robots and AR systems, yet the realistic version of this task is egocentric and its most scalable form is RGB-only. Head-mounted cameras yield blurry, fast-moving, p…