MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding
Researchers have introduced MechVQA, a new dataset designed to evaluate and improve multimodal large language models (MLLMs) on understanding mechanical engineering drawings. The dataset features 3.3k images with over 21,000 question-answer pairs across ten fine-grained tasks, addressing the current limitations of MLLMs in this domain due to annotation density and domain knowledge gaps. To establish a strong baseline, the team also developed the MechVL model, which achieved a 7.57 percentage point improvement over existing models on the MechVQA benchmark, demonstrating enhanced capabilities for applications in mechanical design and inspection. AI
IMPACT Establishes a new benchmark for multimodal LLMs in mechanical engineering, potentially improving AI applications in design and inspection.