English(EN) Pave-GRPO: Beyond Instantaneous Guidance through Principled Average Velocity Decomposition

Pave-GRPO 通过速度分解增强生成模型对齐

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-02 04:00

研究人员推出 Pave-GRPO，一种改进流式生成模型与人类偏好对齐的新方法。该技术通过将粗粒度过渡分解为更精细的子轨迹来重新构建 GRPO 目标，使奖励反馈能够触及更多中间去噪步骤。这种方法在不增加生成成本的情况下增强了对齐的粒度，从而在各种奖励设置下实现更全面的偏好优化和性能提升。 AI

影响增强了生成模型中的偏好对齐，可能带来更细致、更可控的 AI 输出。

排序理由这是一篇详细介绍生成模型新方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Pengyang Ling, Jiazi Bu, Yujie Zhou, Yibin Wang, Zhenyu Hu, Zihan Zhang, Yi Jin, Huaian Chen, Yuhang Zang · 2026-06-02 04:00

Pave-GRPO: Beyond Instantaneous Guidance through Principled Average Velocity Decomposition

arXiv:2606.01636v1 Announce Type: new Abstract: Post-training via Group Relative Policy Optimization (GRPO) has emerged as a powerful paradigm for aligning flow-based generative models with human preferences. However, the iterative denoising nature of flow models incurs substanti…