Researchers have introduced MathV-DP, a new dataset designed to improve multimodal mathematical reasoning by capturing diverse solution trajectories for each image-question pair. This dataset aims to provide richer supervision than traditional one-to-one image-text pairings. They also developed Qwen-VL-DP, a model based on Qwen-VL, which uses supervised learning and a novel group relative policy optimization (GRPO) approach. This method incorporates correctness discrimination and diversity-aware rewards, enabling the model to learn from varied reasoning perspectives and distinguish between correct but different solutions. Experiments on MathVista and Math-V benchmarks show Qwen-VL-DP significantly outperforms existing multimodal LLMs in both accuracy and generative diversity. AI
IMPACT Enhances multimodal LLMs for mathematical tasks by incorporating diverse reasoning paths, potentially improving accuracy and generative diversity.
RANK_REASON The cluster describes a new dataset and a fine-tuned model for multimodal mathematical reasoning, presented in an arXiv paper. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →