New DRA-GRPO method boosts LLM math reasoning by encouraging diverse paths

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

Researchers have introduced DRA-GRPO, a novel framework designed to enhance mathematical reasoning in large language models by addressing the Diversity-Quality Inconsistency inherent in standard GRPO methods. This new approach calibrates reward signals using semantic density and Submodular Mutual Information to de-bias gradient estimation, encouraging the model to explore a wider range of valid reasoning strategies. Empirical results on five mathematical benchmarks show that DRA-GRPO significantly outperforms existing methods, achieving 58.2% accuracy on the DeepSeek-R1-Distill-Qwen-1.5B dataset with a limited number of training samples and a low cost. AI

IMPACT Enhances LLM mathematical reasoning by promoting diverse problem-solving strategies, potentially improving performance on complex tasks.

RANK_REASON The cluster contains an academic paper detailing a new method for improving LLM reasoning capabilities. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New DRA-GRPO method boosts LLM math reasoning by encouraging diverse paths

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Xiwen Chen, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hao Wang, Haiyu Wu, Huayu Li, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi · 2026-06-16 04:00

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning

arXiv:2505.09655v5 Announce Type: replace Abstract: Post-training LLMs with Reinforcement Learning, specifically Group Relative Policy Optimization (GRPO), has emerged as a paradigm for enhancing mathematical reasoning. However, standard GRPO relies on scalar correctness rewards …

COVERAGE [1]

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning

RELATED ENTITIES

RELATED TOPICS