新的VLM评估方法揭示大型模型证据使用不佳

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-23 09:20

一篇新研究论文介绍了一种名为“Ill-Posed by Design”的新颖方法，用于评估视觉语言模型（VLMs）如何利用证据。该研究提出使用单目度量对象大小估计作为一项不适定任务，迫使模型依赖各种不完美的线索，如类别先验、外观和上下文。研究人员组建了一个名为Metric VQA的数据集，并测试了12个开源权重VLMs，发现即使是最大的模型在真实场景中的表现也比仅文本的LLM差。分析显示，虽然目标身份至关重要，但当前VLMs在经过LoRA微调后，仍然很大程度上忽略了全局场景几何。 AI

影响这项研究突显了当前VLM推理和证据利用方面的局限性，表明需要改进架构和训练策略以实现复杂的场景理解。

排序理由研究论文，详细介绍了一种新的VLM评估方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Boaz Meivar, Shaked Perek, Shani Shvartzman, Eli Schwartz, Shai Avidan · 2026-06-24 04:00

Ill-Posed by Design: Probing Evidence Use in VLMs

arXiv:2606.24335v1 Announce Type: new Abstract: Counterfactual analysis is widely used to study evidence use in vision-language models, but its diagnostic value is limited on well-posed tasks: when several cues independently support the same answer, removing one may not change th…
arXiv cs.CV TIER_1 English(EN) · Shai Avidan · 2026-06-23 09:20

Ill-Posed by Design: Probing Evidence Use in VLMs

Counterfactual analysis is widely used to study evidence use in vision-language models, but its diagnostic value is limited on well-posed tasks: when several cues independently support the same answer, removing one may not change the prediction. We propose monocular metric object…

报道来源 [2]

Ill-Posed by Design: Probing Evidence Use in VLMs

Ill-Posed by Design: Probing Evidence Use in VLMs

相关实体

相关话题