PulseAugur
LIVE 10:28:06
tool · [1 source] ·
1
tool

New benchmark CiteVQA exposes "Attribution Hallucination" in LLMs

Researchers have introduced CiteVQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to accurately attribute answers to specific source regions within documents. Unlike previous evaluations that only scored the final answer, CiteVQA requires models to provide element-level bounding-box citations alongside their answers, assessing both jointly. This benchmark, comprising 1,897 questions across 711 PDFs, reveals a significant issue termed "Attribution Hallucination," where models often provide correct answers but cite incorrect evidence, highlighting a critical reliability gap in current document intelligence systems. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This benchmark highlights a critical flaw in current LLMs' ability to cite sources, potentially impacting trust and reliability in high-stakes applications.

RANK_REASON The cluster describes a new academic benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

COVERAGE [1]

  1. arXiv cs.CV TIER_1 · Conghui He ·

    CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

    Multimodal Large Language Models (MLLMs) have significantly advanced document understanding, yet current Doc-VQA evaluations score only the final answer and leave the supporting evidence unchecked. This answer-only approach masks a critical failure mode: a model can land on the c…