English(EN) PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

PointVG-R模型通过几何推理增强视觉基础 · 跟踪3个来源

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-23 13:06

研究人员开发了PointVG-R，这是一种新颖的推理引导多模态大语言模型（MLLM），旨在提高图像中精确点定位的准确性。该模型集成了几何感知推理、强化学习（RL）以及一个名为EgoPoint-CoT的新视觉思维链数据集。PointVG-R模拟人类解释手势的认知过程，并使用自适应重要性加权策略来优化学习。实验表明，PointVG-R取得了最先进的性能，在mIoU方面比基线模型高出15.86个百分点。 AI

影响增强了MLLM的视觉基础能力，有望改进需要从图像进行精确对象定位的应用。

排序理由该集群描述了一篇详细介绍用于视觉基础的新模型和数据集的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-23 13:06

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

Pointing-based visual grounding requires models to precisely locate target objects by deciphering complex spatial relationships between the visual scene and pointing gestures. Traditional methods typically encode input images into static feature representations and perform reason…
arXiv cs.CV TIER_1 English(EN) · Ling Li, Bowen Liu, Zinuo Zhan, Jianhui Zhong, Ziyu Zhu, Bingcai Wei, Kenglun Chang, Zhidong Deng · 2026-06-24 04:00

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

arXiv:2606.24539v1 Announce Type: new Abstract: Pointing-based visual grounding requires models to precisely locate target objects by deciphering complex spatial relationships between the visual scene and pointing gestures. Traditional methods typically encode input images into s…
arXiv cs.CV TIER_1 English(EN) · Zhidong Deng · 2026-06-23 13:06

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

Pointing-based visual grounding requires models to precisely locate target objects by deciphering complex spatial relationships between the visual scene and pointing gestures. Traditional methods typically encode input images into static feature representations and perform reason…

报道来源 [3]

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

相关实体

相关话题