PulseAugur
实时 15:45:24
English(EN) Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

视觉语言模型(VLMs)未能识别空间推理何时不可能

一项新的研究论文引入了SpatialUncertain框架,用于评估视觉语言模型(VLMs)在因遮挡或误导性视角而无法回答空间问题时的识别能力。研究发现,当前前沿的VLMs容易过度自信,在遮挡情况下错误率约为70%,在视角模糊情况下错误率超过90%。此外,许多模型难以确定需要哪些额外的视角来解决此类歧义,这凸显了在超越单纯的答案正确性之外,评估VLM的不确定性和寻求证据的能力的必要性。 AI

影响 突出了VLM在空间推理和不确定性意识方面的关键局限性,推动了超越简单准确性的新评估方法。

排序理由 该集群包含一篇介绍VLM新评估框架的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Yue Zhang, Zun Wang, Han Lin, Yonatan Bitton, Idan Szpektor, Mohit Bansal ·

    Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

    arXiv:2605.30557v1 Announce Type: cross Abstract: Spatial reasoning is a fundamental capability for vision-language models (VLMs) deployed in real-world environments. However, visual observations are inherently limited representations of a 3D world: occlusion can render objects i…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

    Vision-language models exhibit overconfidence in spatial reasoning tasks and struggle to identify when additional observations are needed to resolve uncertainty.