New framework exposes counting bias in Vision-Language Models

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-18 04:00

Researchers have developed CounterCount, a new framework designed to diagnose counting biases in Vision-Language Models (VLMs). The framework uses paired factual and counterfactual images to test whether VLMs rely on visual evidence or learned priors when object counts differ from typical knowledge. Evaluations revealed that current VLMs perform well on factual images but struggle with counterfactual changes, indicating a reliance on object-level priors even when visual evidence contradicts them. CounterCount also showed that models underweight attention to count-relevant visual tokens, and proposed an attention modulation strategy to improve accuracy. AI

影响 Exposes prior-driven counting failures in VLMs, guiding the development of future models that better integrate visual evidence.

排序理由 The cluster contains an academic paper detailing a new diagnostic framework for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Bernard Ghanem · 2026-05-18 04:00

CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

Vision-Language Models (VLMs) excel at multimodal reasoning, yet it remains unclear whether their answers are grounded in visual evidence or driven by learned language and world priors. Counting provides a precise testbed: when visual evidence conflicts with canonical object know…

报道来源 [1]

CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

相关实体

相关话题