Researchers have developed HakushoBench, a new benchmark for evaluating vision-language models (VLMs) on their ability to understand Japanese charts and tables. The dataset is derived from 33 Japanese governmental white papers, containing over 2,000 images and manually annotated question-answer pairs. Initial experiments show a significant performance gap between open-weight and proprietary models, indicating substantial room for improvement in VLM capabilities for complex, non-English document analysis. AI
IMPACT Establishes a new evaluation standard for VLM performance on non-English visual data, potentially driving improvements in multilingual document understanding.
RANK_REASON The cluster describes a new academic benchmark dataset for evaluating AI models.
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →