English(EN) CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

新基准CiteVQA揭示LLM中的“归因幻觉”

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-13 01:54

研究人员推出了CiteVQA，这是一个旨在评估多模态大语言模型（MLLM）将答案准确归因于文档内特定源区域能力的新基准。与仅对最终答案评分的先前评估不同，CiteVQA要求模型在答案旁边提供元素级边界框引用，联合评估两者。该基准包含711个PDF文件中的1897个问题，揭示了一个被称为“归因幻觉”的重大问题，即模型经常提供正确的答案但引用错误的证据，这凸显了当前文档智能系统中存在的关键可靠性差距。 AI

影响该基准突显了当前LLM引用来源能力的一个关键缺陷，可能影响高风险应用中的信任度和可靠性。

排序理由该集群描述了一个用于评估AI模型的新学术基准。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Conghui He · 2026-05-13 01:54

CiteVQA：为可信文档智能基准测试证据归因

Multimodal Large Language Models (MLLMs) have significantly advanced document understanding, yet current Doc-VQA evaluations score only the final answer and leave the supporting evidence unchecked. This answer-only approach masks a critical failure mode: a model can land on the c…

报道来源 [1]

CiteVQA：为可信文档智能基准测试证据归因

相关实体

相关话题