English(EN)Reinforcement Learning for Flow-Matching Policies with Density Transport
新研究通过理论和算法改进推动流匹配模型
作者PulseAugur 编辑部·[8 个来源]·
研究人员为流匹配模型(一种生成模型)开发了新的理论基础和实用算法。其中一篇论文为神经网络参数化的条件速度场建立了收敛保证并提供了泛化界限。另一篇论文介绍了 Flow-DPPO,一种改进的强化学习方法,它用散度近邻约束取代了比例裁剪,以实现更稳定高效的训练。第三种方法 RLDT 使用具有密度传输的强化学习来微调流匹配策略以用于连续控制任务,其性能优于现有基线。
AI
arXiv:2606.10089v1 Announce Type: cross Abstract: In this work, we develop theoretical foundation for flow matching with neural-network-parameterized conditional velocity fields. We establish convergence guarantees for gradient descent in the over-parameterized 2-layered ReLU neu…
arXiv:2606.11025v1 Announce Type: new Abstract: Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising pr…
Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO…
arXiv cs.AI
TIER_1English(EN)·Boshu Lei, Kostas Daniilidis, Antonio Loquercio·
arXiv:2606.08602v1 Announce Type: cross Abstract: We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-based policy improvement as a transport of action densities towards re…
Flow-DPPO replaces ratio clipping with divergence proximal constraints in flow matching models, improving training stability and multi-objective optimization through exact KL divergence computation.
We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-based policy improvement as a transport of action densities towards regions of high reward, which naturally aligns with …
arXiv:2606.11155v1 Announce Type: new Abstract: Flow Matching models have demonstrated strong performance across a wide range of generative tasks. However, their reliance on ODE-based iterative sampling incurs substantial computational overhead in inference, which limits their ap…
Flow Matching models have demonstrated strong performance across a wide range of generative tasks. However, their reliance on ODE-based iterative sampling incurs substantial computational overhead in inference, which limits their applicability in real-time scenes. While distillat…