Vision-language models' 'yes' bias mitigated by system attention redistribution

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have identified a novel cause for hallucination in vision-language models (VLMs), attributing it to imbalances in how the system allocates attention across input modalities. Their study suggests that functionally redundant system weights can reduce attention to image and text inputs, leading to a 'yes-bias' where VLMs indiscriminately respond affirmatively. By redistributing attention from the system modality to image and textual inputs, the researchers significantly suppressed this bias, outperforming existing methods and highlighting system attention as a critical factor for VLM hallucination mitigation. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new method to reduce VLM 'yes-bias' by reallocating system attention, potentially improving model reliability.

RANK_REASON Academic paper on a novel approach to mitigating VLM hallucination.

Read on arXiv cs.CL →

paper
safety

COVERAGE [1]

arXiv cs.CL TIER_1 · Tsan Tsai Chan, Varsha Suresh, Anisha Saha, Michael Hahn, Vera Demberg · 2026-04-27 04:00

System-Mediated Attention Imbalances Make Vision-Language Models Say Yes

arXiv:2601.12430v2 Announce Type: replace Abstract: Vision-language model (VLM) hallucination is commonly linked to imbalanced allocation of attention across input modalities: system, image and text. However, existing mitigation strategies tend towards an image-centric interpreta…

COVERAGE [1]

System-Mediated Attention Imbalances Make Vision-Language Models Say Yes

RELATED ENTITIES

RELATED TOPICS