PulseAugur
EN
LIVE 20:10:10

New benchmark CiteVQA exposes "Attribution Hallucination" in LLMs

Researchers have introduced CiteVQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to accurately attribute answers to specific source regions within documents. Unlike previous evaluations that only scored the final answer, CiteVQA requires models to provide element-level bounding-box citations alongside their answers, assessing both jointly. This benchmark, comprising 1,897 questions across 711 PDFs, reveals a significant issue termed "Attribution Hallucination," where models often provide correct answers but cite incorrect evidence, highlighting a critical reliability gap in current document intelligence systems. AI

IMPACT This benchmark highlights a critical flaw in current LLMs' ability to cite sources, potentially impacting trust and reliability in high-stakes applications.

RANK_REASON The cluster describes a new academic benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New benchmark CiteVQA exposes "Attribution Hallucination" in LLMs

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Conghui He ·

    CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

    Multimodal Large Language Models (MLLMs) have significantly advanced document understanding, yet current Doc-VQA evaluations score only the final answer and leave the supporting evidence unchecked. This answer-only approach masks a critical failure mode: a model can land on the c…