English(EN)ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
VLMs应对视觉错觉、空间推理和评估基准
作者PulseAugur 编辑部·[7 个来源]·
研究人员正在开发新方法来提高视觉语言模型(VLM)的鲁棒性和推理能力。一种方法是结构化定性推理(SQI),旨在通过增强视觉基础而不进行模型微调来减轻视觉错觉。另一个重点是改进VLM空间推理的评估,开发了ReVSI等新基准来解决当前评估中存在的系统性无效问题。此外,还在努力使VLM能够更有效地利用几何参考表示来推理3D空间,并探索绕过显式语言中介的潜在视觉推理。
AI
While Vision-Language Models (VLMs) have achieved state-of-the-art performance in general visual tasks, their perceptual robustness remains remarkably brittle when confronted with optical illusions. These failures are often attributed to shortcut heuristics, where models prioriti…
arXiv:2604.26250v1 Announce Type: new Abstract: While Vision-Language Models (VLMs) have achieved state-of-the-art performance in general visual tasks, their perceptual robustness remains remarkably brittle when confronted with optical illusions. These failures are often attribut…
While Vision-Language Models (VLMs) have achieved state-of-the-art performance in general visual tasks, their perceptual robustness remains remarkably brittle when confronted with optical illusions. These failures are often attributed to shortcut heuristics, where models prioriti…
arXiv:2603.08592v2 Announce Type: replace Abstract: While Multimodal Large Language Models (MLLMs) have achieved remarkable success in 2D visual understanding, their ability to reason about 3D space remains limited. To address this gap, we introduce geometrically referenced 3D sc…
arXiv:2604.24300v1 Announce Type: new Abstract: Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally c…
Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such …