Two new benchmarks, TableVista and WildTableBench, have been released to evaluate the capabilities of multimodal foundation models in understanding tables. TableVista focuses on visual and structural complexity with 30,000 samples, revealing that current models struggle with complex layouts and vision-only settings. WildTableBench addresses real-world table images from online sources, with 928 questions across 17 subtypes, showing that most evaluated models perform poorly, with only one exceeding 50% accuracy. AI
Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →
IMPACT Highlights critical gaps in current multimodal AI capabilities for table understanding, particularly with complex visual and structural data.
RANK_REASON Two new academic papers introduce benchmarks for evaluating multimodal table reasoning in foundation models.