Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 1w

Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy

Researchers have developed a new framework to distinguish between visual interpretation and factual recall in Large Vision-Language Models (LVLMs). Existing evaluations often conflate these two abilities, making it difficult to assess true visual reasoning. Experiments with 15 state-of-the-art LVLMs using a counterfactual visualization literacy assessment revealed that many models rely more on factual priors than visual evidence when conflicts arise, a behavior that differs from human test subjects. AI

IMPACT This research highlights a critical gap in evaluating LVLMs, suggesting that current benchmarks may overestimate their visual reasoning capabilities and emphasizing the need for more robust assessment methods.

LVLMs
Large Vision-Language Models
CVLAT
reVLAT