PulseAugur
LIVE 10:46:03
research · [4 sources] ·
0
research

New benchmarks DRAGON and OmniSch test LMMs on diagram reasoning

Researchers have introduced DRAGON, a new benchmark designed to evaluate how well vision-language models (VLMs) can ground their reasoning in specific visual evidence within diagrams. This benchmark addresses the limitation where models might achieve correct answers through spurious correlations rather than genuine understanding of the visual information. DRAGON includes over 11,000 annotated question instances from six existing diagram QA datasets, with a test set featuring human-verified reasoning evidence annotations. Eight VLMs were evaluated on their ability to localize this evidence across various diagram types, aiming to improve interpretability and reliability in diagram-based reasoning. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Improves evaluation of visual reasoning in diagrams, pushing for more interpretable and reliable AI systems.

RANK_REASON This is a research paper introducing a new benchmark for evaluating AI models.

Read on arXiv cs.CV →

COVERAGE [4]

  1. Hugging Face Daily Papers TIER_1 ·

    DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

    Diagram question answering (DQA) requires models to interpret structured visual representations such as charts, maps, infographics, circuit schematics, and scientific diagrams. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct ans…

  2. arXiv cs.CV TIER_1 · Anirudh Iyengar Kaniyar Narayana Iyengar, Tampu Ravi Kumar, Gaurav Najpande, Manan Suri, Dinesh Manocha, Puneet Mathur, Vivek Gupta ·

    DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

    arXiv:2604.25231v1 Announce Type: new Abstract: Diagram question answering (DQA) requires models to interpret structured visual representations such as charts, maps, infographics, circuit schematics, and scientific diagrams. Recent vision-language models (VLMs) often achieve high…

  3. arXiv cs.CV TIER_1 · Vivek Gupta ·

    DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

    Diagram question answering (DQA) requires models to interpret structured visual representations such as charts, maps, infographics, circuit schematics, and scientific diagrams. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct ans…

  4. arXiv cs.CV TIER_1 · Taiting Lu, Kaiyuan Lin, Yuxin Tian, Mingjia Wang, Yubo Wang, Muchuan Wang, Sharique Khatri, Akshit Kartik, Yixi Wang, Amey Santosh Rane, Yida Wang, Sung-Liang Chen, Yifan Yang, Yi-Chao Chen, Yincheng Jin, Mahanth Gowda ·

    OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

    arXiv:2604.00270v2 Announce Type: replace Abstract: Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) schematic diagrams into machi…