Medical VLMs struggle with negated answers, new benchmark reveals

By PulseAugur Editorial · [1 sources] · 2026-05-08 04:00

Researchers have developed CXR-ContraBench, a new benchmark designed to evaluate the performance of medical vision-language models (VLMs) in correctly interpreting negated statements within chest X-ray analyses. The benchmark highlights a significant issue where models are attracted to negated answer options, leading to clinically risky contradictions. While models like MedGemma and Qwen2.5-VL show substantial failure rates, a new method called QCCV-Neg has demonstrated the ability to deterministically correct these polarity-confused subsets without retraining. AI

IMPACT Introduces a benchmark to expose and address critical inference-time polarity failures in medical VLMs, potentially improving diagnostic accuracy.

RANK_REASON This is a research paper introducing a new benchmark and a method for evaluating and improving medical vision-language models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Medical VLMs struggle with negated answers, new benchmark reveals

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Zhengru Fang, Yanan Ma, Yu Guo, Senkang Hu, Yixian Zhang, Hangcheng Cao, Wenbo Ding, Yuguang Fang · 2026-05-08 04:00

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

arXiv:2605.05810v1 Announce Type: new Abstract: When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clin…

COVERAGE [1]

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

RELATED ENTITIES

RELATED TOPICS