English(EN) Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

GeoVR框架为多模态大语言模型增加3D空间感知能力

作者 PulseAugur 编辑部 · [4 个来源] · 2026-06-04 00:00

研究人员开发了GeoVR，一个旨在为多模态大语言模型（MLLMs）注入3D空间感知能力的新框架。该框架仅使用2D视频序列，通过从现有的3D基础模型中提炼几何知识到MLLMs中来实现。该框架采用多目标学习策略，包含四个几何目标，如相机姿态估计和深度图回归，以增强模型的内部表示。实验表明，GeoVR在空间推理基准测试中取得了最先进的性能，为开发空间智能基础模型提供了一种新方法。 AI

影响增强了多模态大语言模型的3D空间推理能力，可能改进机器人、AR/VR和场景理解等应用。

排序理由该集群包含一篇详细介绍新框架及其实验结果的学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 08:11

从视频中学习几何表示，用于空间智能多模态大语言模型

Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a no…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 00:00

从视频中学习几何表示，用于空间智能多模态大语言模型

GeoVR enhances multimodal large language models with 3D awareness by restructuring their semantic latent space through geometric knowledge distillation from 3D foundation models using multiple geometric targets.
arXiv cs.CV TIER_1 English(EN) · Haibo Wang, Lifu Huang · 2026-06-05 04:00

从视频中学习几何表示，用于空间智能多模态大语言模型

arXiv:2606.05833v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcit…
arXiv cs.CV TIER_1 English(EN) · Lifu Huang · 2026-06-04 08:11

从视频中学习几何表示，用于空间智能多模态大语言模型

Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a no…

报道来源 [4]

从视频中学习几何表示，用于空间智能多模态大语言模型

从视频中学习几何表示，用于空间智能多模态大语言模型

从视频中学习几何表示，用于空间智能多模态大语言模型

从视频中学习几何表示，用于空间智能多模态大语言模型

相关实体

相关话题