VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark
Researchers have introduced VAMPS, a new benchmark designed to evaluate multimodal large language models' ability to solve mathematical problems using visual aids. The benchmark includes over a thousand bilingual question-answer pairs, many of which are naturally solved by plotting graphs. Initial findings indicate that direct analytical solving methods currently outperform tool-enabled visual solving, even on problems where visualization is a suitable strategy. AI
IMPACT Highlights a current limitation in LLMs' ability to integrate visual tools for complex mathematical reasoning, suggesting areas for future model development.