PulseAugur
实时 11:35:52

新的强化学习方法通过控制rollout通过率来优化智能体训练

研究人员开发了一种名为前缀采样(PS)的新技术,以提高AI智能体强化学习(RL)的效率。该方法通过将rollout组引导至50%的通过率来解决因通过率倾斜而浪费计算资源的问题,从而最大化奖励熵和对比信号。在SWE-bench任务上,PS在Qwen3-14B上实现了2.01倍的速度提升,在Qwen3-32B上实现了1.55倍的速度提升,同时还提高了验证性能。 AI

影响 引入了一种加速智能体强化学习训练的方法,有可能降低开发复杂AI系统的计算成本。

排序理由 这是一篇详细介绍强化学习新方法的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新的强化学习方法通过控制rollout通过率来优化智能体训练

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao, Yucheng Zeng, Jingnan Gu, Daxiang Dong, Jianmin Wu, Dawei Yin, Dou Shen ·

    Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

    arXiv:2605.05112v1 Announce Type: new Abstract: SWE-bench-style agentic reinforcement learning relies on expensive stateful trajectories, yet substantial compute is wasted on sampled rollout groups with skewed pass rates, where binary rewards provide a weak contrastive signal. We…

  2. arXiv cs.LG TIER_1 English(EN) · Dou Shen ·

    Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

    SWE-bench-style agentic reinforcement learning relies on expensive stateful trajectories, yet substantial compute is wasted on sampled rollout groups with skewed pass rates, where binary rewards provide a weak contrastive signal. We frame this inefficiency as a pass-rate control …