DTBench: A Synthetic Benchmark for Document-to-Table Extraction
Researchers have introduced new benchmarks and improved models for document parsing and table extraction. Dr. DocBench focuses on expert-level document parsing, including complex structures like chemical formulas and music notation, highlighting current model limitations. DTBench offers a synthetic benchmark for document-to-table extraction, evaluating LLMs on reasoning and conflict resolution. Additionally, PaddleOCR-VL-1.6 has been enhanced with region-aware optimization and progressive post-training, achieving state-of-the-art results on OmniDocBench v1.6. AI
IMPACT Advances in document and table extraction benchmarks and models will improve AI's ability to process and analyze complex documents and data.