English(EN) DecompRL: Solving Harder Problems by Learning Modular Code Generation

新的强化学习算法将问题分解，降低大型语言模型成本

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 16:25

研究人员推出了一种新颖的强化学习算法 DecompRL，旨在增强大型语言模型（LLMs）的解决问题能力。DecompRL 不依赖于广泛的采样或多样性优化，而是专注于将复杂问题分解为更小、更易于管理子函数。该算法学习生成和重组这些模块的代码，显著降低了寻找解决方案相关的计算成本。这种方法在 LiveCodeBench 和 CodeContests 等基准测试中表现出色，使大型语言模型能够解决以前无法解决的问题。 AI

影响这种方法可以显著降低大型语言模型解决问题的计算成本，使其能够高效地处理更复杂的任务。

排序理由该集群包含一篇详细介绍大型语言模型新算法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Juliette Decugis, Fabian Gloeckle, Francis Bach, Taco Cohen, Gabriel Synnaeve · 2026-07-03 04:00

DecompRL: Solving Harder Problems by Learning Modular Code Generation

arXiv:2607.02390v1 Announce Type: new Abstract: How can Large Language Models (LLMs) solve problems they currently cannot? Repeated sampling scales test-time compute but GPU cost grows linearly with attempts, while reinforcement learning (RL) with verifiable rewards improves sing…
arXiv cs.LG TIER_1 English(EN) · Gabriel Synnaeve · 2026-07-02 16:25

DecompRL: Solving Harder Problems by Learning Modular Code Generation

How can Large Language Models (LLMs) solve problems they currently cannot? Repeated sampling scales test-time compute but GPU cost grows linearly with attempts, while reinforcement learning (RL) with verifiable rewards improves single-attempt accuracy at the expense of sample div…

报道来源 [2]

DecompRL: Solving Harder Problems by Learning Modular Code Generation

DecompRL: Solving Harder Problems by Learning Modular Code Generation

相关实体

相关话题