PulseAugur
EN
LIVE 12:06:47

New DRA-GRPO method boosts LLM math reasoning by encouraging diverse paths

Researchers have introduced DRA-GRPO, a novel framework designed to enhance mathematical reasoning in large language models by addressing the Diversity-Quality Inconsistency inherent in standard GRPO methods. This new approach calibrates reward signals using semantic density and Submodular Mutual Information to de-bias gradient estimation, encouraging the model to explore a wider range of valid reasoning strategies. Empirical results on five mathematical benchmarks show that DRA-GRPO significantly outperforms existing methods, achieving 58.2% accuracy on the DeepSeek-R1-Distill-Qwen-1.5B dataset with a limited number of training samples and a low cost. AI

IMPACT Enhances LLM mathematical reasoning by promoting diverse problem-solving strategies, potentially improving performance on complex tasks.

RANK_REASON The cluster contains an academic paper detailing a new method for improving LLM reasoning capabilities. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Xiwen Chen, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hao Wang, Haiyu Wu, Huayu Li, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi ·

    DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning

    arXiv:2505.09655v5 Announce Type: replace Abstract: Post-training LLMs with Reinforcement Learning, specifically Group Relative Policy Optimization (GRPO), has emerged as a paradigm for enhancing mathematical reasoning. However, standard GRPO relies on scalar correctness rewards …