新的IH-GRPO算法增强LLM数学推理能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-18 14:54

研究人员推出了一种名为IH-GRPO的新型算法，旨在通过将工具调用与即时执行分离来提高大型语言模型的数学推理能力。这种方法使模型能够保持推理的连贯性和表达力，从而在域外基准测试中取得显著的性能提升。实验表明，与现有方法相比，IH-GRPO在各种Qwen3模型的数学推理任务上实现了高达2.53%的绝对提升。 AI

影响通过将工具使用与执行分离来增强LLM的推理能力，有可能提高复杂任务的性能。

排序理由该集群包含一篇详细介绍LLM推理新算法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Guojun Yin · 2026-05-18 14:54

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Large language models (LLMs) have increasingly leveraged tool invocation to enhance their reasoning capabilities. However, existing approaches typically tightly couple tool invocation with immediate execution. Such immediate tool interaction may disrupt the reasoning coherence of…

报道来源 [1]

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

相关实体

相关话题