PulseAugur
EN
LIVE 02:27:02

New dataset and model enhance multimodal math reasoning with diverse perspectives

Researchers have introduced MathV-DP, a new dataset designed to improve multimodal mathematical reasoning by capturing diverse solution trajectories for each image-question pair. This dataset aims to provide richer supervision than traditional one-to-one image-text pairings. They also developed Qwen-VL-DP, a model based on Qwen-VL, which uses supervised learning and a novel group relative policy optimization (GRPO) approach. This method incorporates correctness discrimination and diversity-aware rewards, enabling the model to learn from varied reasoning perspectives and distinguish between correct but different solutions. Experiments on MathVista and Math-V benchmarks show Qwen-VL-DP significantly outperforms existing multimodal LLMs in both accuracy and generative diversity. AI

IMPACT Enhances multimodal LLMs for mathematical tasks by incorporating diverse reasoning paths, potentially improving accuracy and generative diversity.

RANK_REASON The cluster describes a new dataset and a fine-tuned model for multimodal mathematical reasoning, presented in an arXiv paper. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New dataset and model enhance multimodal math reasoning with diverse perspectives

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Wenhao Shi, Zhiqiang Hu, Yi Bin, Guoqing Wang, Xing Xu, Yang Yang, See-Kiong Ng ·

    Multimodal Mathematical Reasoning with Diverse Solving Perspective

    arXiv:2507.02804v2 Announce Type: replace Abstract: Recent progress in large-scale reinforcement learning (RL) has notably enhanced the reasoning capabilities of large language models (LLMs), especially in mathematical domains. However, current multimodal LLMs (MLLMs) for mathema…