PulseAugur
实时 21:39:14
English(EN) ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

新基准 SciMDR 和 ShredBench 评估多模态大语言模型在科学文档和重建方面的能力

研究人员推出了 ShredBench,这是一个旨在评估多模态大语言模型(MLLMs)从碎片化文档中重建文档的语义推理能力的新基准。该基准利用自动化流程生成碎片化文档,确保评估不受训练数据污染。对当前 MLLMs 的初步测试显示,随着文档碎片化的增加,性能显著下降,表明它们在弥合视觉不连续性和执行细粒度跨模态推理方面存在差距。 AI

影响 突出了当前 MLLMs 在从碎片化来源重建文档方面的局限性,并指出了未来研究的方向。

排序理由 引入用于评估 MLLMs 在特定任务上表现的新基准。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新基准 SciMDR 和 ShredBench 评估多模态大语言模型在科学文档和重建方面的能力

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Ziyu Chen, Yilun Zhao, Chengye Wang, Rilyn Han, Manasi Patwardhan, Arman Cohan ·

    SciMDR: Advancing Scientific Multimodal Document Reasoning

    arXiv:2603.12249v2 Announce Type: replace Abstract: Constructing scientific multimodal document reasoning datasets for foundation model training involves an inherent trade-off among scale, faithfulness, and realism. To address this challenge, we introduce the synthesize-and-regro…

  2. arXiv cs.CL TIER_1 English(EN) · Wenping Ma ·

    ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

    Multimodal Large Language Models (MLLMs) have achieved remarkable performance in Visually Rich Document Understanding (VRDU) tasks, but their capabilities are mainly evaluated on pristine, well-structured document images. We consider content restoration from shredded fragments, a…

  3. arXiv cs.CV TIER_1 English(EN) · Zichun Guo, Yuling Shi, Wenhao Zeng, Chao Hu, Haotian Lin, Terry Yue Zhuo, Jiawei Chen, Xiaodong Gu, Wenping Ma ·

    ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

    arXiv:2604.23813v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable performance in Visually Rich Document Understanding (VRDU) tasks, but their capabilities are mainly evaluated on pristine, well-structured document images. We conside…