PulseAugur
EN
LIVE 14:38:56

New Benchmark Evaluates VLMs on Extracting Data from Epidemic Curves

Researchers have introduced EpiCurveBench, a new benchmark designed to evaluate vision-language models (VLMs) on the task of extracting data from epidemic curve charts. This benchmark includes 1,000 real-world epidemic curve images and a novel evaluation metric called EpiCurveSimilarity (ECS). ECS is designed to better capture the temporal structure of time-series data compared to existing key-value extraction metrics, by aligning predicted and ground-truth series using dynamic programming. Initial evaluations show that even the strongest VLMs achieve only 52.3% ECS, highlighting the challenges in this domain and the limitations of current evaluation methods. AI

IMPACT This benchmark and metric could lead to more accurate VLM performance evaluations for time-series chart extraction, with potential applications in public health data analysis.

RANK_REASON The cluster describes a new academic paper introducing a benchmark and evaluation metric for VLM performance on a specific task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Benchmark Evaluates VLMs on Extracting Data from Epidemic Curves

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Thomas Berkane, Maimuna S. Majumder ·

    EpiCurveBench: Evaluating VLMs on Epidemic Curve Digitization

    arXiv:2605.27195v1 Announce Type: new Abstract: Chart-to-data extraction with vision-language models (VLMs) is increasingly evaluated on benchmarks that show diminishing headroom (frontier VLMs exceed 89% on ChartQA) and with metrics that treat extracted points as unordered key-v…

  2. arXiv cs.CL TIER_1 English(EN) · Maimuna S. Majumder ·

    EpiCurveBench: Evaluating VLMs on Epidemic Curve Digitization

    Chart-to-data extraction with vision-language models (VLMs) is increasingly evaluated on benchmarks that show diminishing headroom (frontier VLMs exceed 89% on ChartQA) and with metrics that treat extracted points as unordered key-value pairs, ignoring the temporal structure of t…