Researchers have introduced EpiCurveBench, a new benchmark designed to evaluate vision-language models (VLMs) on the task of extracting data from epidemic curve charts. This benchmark includes 1,000 real-world epidemic curve images and a novel evaluation metric called EpiCurveSimilarity (ECS). ECS is designed to better capture the temporal structure of time-series data compared to existing key-value extraction metrics, by aligning predicted and ground-truth series using dynamic programming. Initial evaluations show that even the strongest VLMs achieve only 52.3% ECS, highlighting the challenges in this domain and the limitations of current evaluation methods. AI
IMPACT This benchmark and metric could lead to more accurate VLM performance evaluations for time-series chart extraction, with potential applications in public health data analysis.
RANK_REASON The cluster describes a new academic paper introducing a benchmark and evaluation metric for VLM performance on a specific task.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →