English(EN) Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

视觉语言模型（VLMs）未能识别空间推理何时不可能

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-28 00:00

一项新的研究论文引入了SpatialUncertain框架，用于评估视觉语言模型（VLMs）在因遮挡或误导性视角而无法回答空间问题时的识别能力。研究发现，当前前沿的VLMs容易过度自信，在遮挡情况下错误率约为70%，在视角模糊情况下错误率超过90%。此外，许多模型难以确定需要哪些额外的视角来解决此类歧义，这凸显了在超越单纯的答案正确性之外，评估VLM的不确定性和寻求证据的能力的必要性。 AI

影响突出了VLM在空间推理和不确定性意识方面的关键局限性，推动了超越简单准确性的新评估方法。

排序理由该集群包含一篇介绍VLM新评估框架的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Yue Zhang, Zun Wang, Han Lin, Yonatan Bitton, Idan Szpektor, Mohit Bansal · 2026-06-01 04:00

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

arXiv:2605.30557v1 Announce Type: cross Abstract: Spatial reasoning is a fundamental capability for vision-language models (VLMs) deployed in real-world environments. However, visual observations are inherently limited representations of a 3D world: occlusion can render objects i…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Vision-language models exhibit overconfidence in spatial reasoning tasks and struggle to identify when additional observations are needed to resolve uncertainty.

报道来源 [2]

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

相关实体

相关话题