New benchmark CiteVQA exposes "Attribution Hallucination" in LLMs

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced CiteVQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to accurately attribute answers to specific source regions within documents. Unlike previous evaluations that only scored the final answer, CiteVQA requires models to provide element-level bounding-box citations alongside their answers, assessing both jointly. This benchmark, comprising 1,897 questions across 711 PDFs, reveals a significant issue termed "Attribution Hallucination," where models often provide correct answers but cite incorrect evidence, highlighting a critical reliability gap in current document intelligence systems. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This benchmark highlights a critical flaw in current LLMs' ability to cite sources, potentially impacting trust and reliability in high-stakes applications.

RANK_REASON The cluster describes a new academic benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

COVERAGE [1]

arXiv cs.CV TIER_1 · Conghui He · 2026-05-13 01:54

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

Multimodal Large Language Models (MLLMs) have significantly advanced document understanding, yet current Doc-VQA evaluations score only the final answer and leave the supporting evidence unchecked. This answer-only approach masks a critical failure mode: a model can land on the c…

COVERAGE [1]

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

RELATED ENTITIES

RELATED TOPICS