Researchers have introduced PureDocBench, a new benchmark for document parsing that addresses issues with the existing OmniDocBench dataset, which suffers from annotation errors and potential contamination. PureDocBench is programmatically generated and source-traceable, offering a more reliable evaluation across clean, digitally degraded, and real-world document settings. Initial evaluations on 40 models reveal that document parsing is far from solved, with significant performance gaps between models and a shared bottleneck in formula recognition. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT PureDocBench provides a more reliable evaluation for document parsing models, highlighting current limitations and guiding future research.
RANK_REASON The cluster describes a new benchmark for evaluating document parsing models, along with findings from its initial application. [lever_c_demoted from research: ic=1 ai=1.0]