Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 1mo · [3 sources]

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

Researchers have introduced ShredBench, a new benchmark designed to evaluate the semantic reasoning abilities of multimodal large language models (MLLMs) in reconstructing documents from shredded fragments. This benchmark utilizes an automated pipeline to generate fragmented documents, ensuring that evaluations are not contaminated by training data. Initial tests on current MLLMs reveal a significant drop in performance as document fragmentation increases, indicating a gap in their ability to bridge visual discontinuities and perform fine-grained cross-modal reasoning. AI

IMPACT Highlights limitations in current MLLMs for document reconstruction from fragmented sources, suggesting areas for future research.

MLLMs
English
Markdown
Chinese
ShredBench
Code
Table