PulseAugur
EN
LIVE 16:36:45

New Dataset 'DiagramBank' Curates 57K Scientific Diagrams for AI Research

Researchers have developed DiagramBank, a new dataset containing over 57,000 schematic diagrams extracted from AI and ML papers hosted on OpenReview. This dataset meticulously links each diagram to its source paper's title, abstract, caption, and in-text references, providing valuable context. DiagramBank is designed to support advancements in scientific document understanding, diagram retrieval, and the creation of new benchmarks, with a reported precision of 93.67% based on a manual audit. AI

IMPACT Provides a structured resource to improve AI model understanding of scientific diagrams and their context.

RANK_REASON The cluster describes the release of a new dataset for AI/ML research, including its methodology and audit results. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Ling Yue, Tingwen Zhang, Jiaying Wang, Zhen Xu, Shaowu Pan ·

    DiagramBank: A Quality-Audited Dataset of Scientific Schematic Diagrams with Multi-Level Document Context

    arXiv:2604.20857v2 Announce Type: replace-cross Abstract: Scientific papers use schematic diagrams to communicate methods, workflows, and system structure, yet existing scientific-figure corpora often mix them with plots, screenshots, and photographs and rarely preserve document …