Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy
Researchers have developed a new framework to distinguish between visual interpretation and factual recall in Large Vision-Language Models (LVLMs). Existing evaluations often conflate these two abilities, making it difficult to assess true visual reasoning. Experiments with 15 state-of-the-art LVLMs using a counterfactual visualization literacy assessment revealed that many models rely more on factual priors than visual evidence when conflicts arise, a behavior that differs from human test subjects. AI
IMPACT This research highlights a critical gap in evaluating LVLMs, suggesting that current benchmarks may overestimate their visual reasoning capabilities and emphasizing the need for more robust assessment methods.