Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
Researchers have developed EngVQA, a new benchmark designed to evaluate the engineering reasoning capabilities of Vision-Language Models (VLMs). The benchmark includes 696 problems across five engineering subjects and utilizes an 8-stage evaluation framework to assess intermediate reasoning processes, not just final answers. Initial benchmarking of state-of-the-art VLMs revealed significant limitations in their current engineering reasoning abilities. AI
IMPACT Highlights the need for more robust evaluation methods for AI in specialized domains like engineering.