Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 1w · [2 sources]

Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams

Researchers have introduced Enginuity, a new dataset and benchmark designed to evaluate vision-language models (VLMs) on complex engineering diagrams. The dataset, derived from U.S. military manuals, includes tasks for extracting parts tables and answering visual questions about diagrams. Initial evaluations of leading VLMs like GPT-5.2 Chat and Claude Opus 4.7 revealed significant gaps in their ability to accurately describe parts and perform factual reasoning within this specialized domain. AI

IMPACT This benchmark will help drive VLM development for specialized technical domains, potentially improving AI's utility in engineering and maintenance.

Claude Opus 4.7
Qwen3-VL-32B-Instruct
Gemma 4
Enginuity
GPT-5.2 Chat