English(EN) Reinforcement Learning for Flow-Matching Policies with Density Transport

新研究通过理论和算法改进推动流匹配模型

作者 PulseAugur 编辑部 · [11 个来源] · 2026-06-07 12:28

研究人员为流匹配模型（一种生成模型）开发了新的理论基础和实用算法。其中一篇论文为神经网络参数化的条件速度场建立了收敛保证并提供了泛化界限。另一篇论文介绍了 Flow-DPPO，一种改进的强化学习方法，它用散度近邻约束取代了比例裁剪，以实现更稳定高效的训练。第三种方法 RLDT 使用具有密度传输的强化学习来微调流匹配策略以用于连续控制任务，其性能优于现有基线。 AI

影响这些流匹配模型的进步可能导致更高效、更稳定的生成式AI在图像和视频生成等任务中得到应用，并在连续控制问题中获得更好的性能。

排序理由多篇arXiv论文详细介绍了流匹配模型的新理论框架和算法改进。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 11 个来源。我们如何撰写摘要 →

报道来源 [11]

arXiv cs.AI TIER_1 English(EN) · Jianming Ma, Qiyue Yang, Yang Zhang, Liyun Yan, Zhanxiang Cao, Yazhou Zhang, Yue Gao · 2026-06-12 04:00

PolyFlow：具有约束嵌入和无投影更新的安全高效多胞体约束流匹配

arXiv:2606.13400v1 Announce Type: cross Abstract: While flow-based generative models have demonstrated strong performance across a wide range of domains, deploying them in safety-critical physical systems remains challenging due to strict constraint requirements. Existing approac…
arXiv cs.AI TIER_1 English(EN) · Yue Gao · 2026-06-11 14:30

PolyFlow：具有约束嵌入和无投影更新的安全高效多胞体约束流匹配

While flow-based generative models have demonstrated strong performance across a wide range of domains, deploying them in safety-critical physical systems remains challenging due to strict constraint requirements. Existing approaches typically enforce safety through post-hoc corr…
arXiv cs.LG TIER_1 English(EN) · Zeyang Li, Sunbochen Tang, Navid Azizan · 2026-06-11 04:00

反向流匹配：一种用于扩散和流策略在线强化学习的统一框架

arXiv:2601.08136v2 Announce Type: replace Abstract: Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty that distinguishes onli…
arXiv cs.AI TIER_1 English(EN) · Yihan He, Qishuo Yin, Yuan Cao, Jianqing Fan, Han Liu · 2026-06-10 04:00

关于神经网络流匹配的理论

arXiv:2606.10089v1 Announce Type: cross Abstract: In this work, we develop theoretical foundation for flow matching with neural-network-parameterized conditional velocity fields. We establish convergence guarantees for gradient descent in the over-parameterized 2-layered ReLU neu…
arXiv cs.LG TIER_1 English(EN) · Bowen Ping, Xiangxin Zhou, Penghui Qi, Minnan Luo, Liefeng Bo, Tianyu Pang · 2026-06-10 04:00

Flow-DPPO：流匹配模型的散度近端策略优化

arXiv:2606.11025v1 Announce Type: new Abstract: Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising pr…
arXiv cs.LG TIER_1 English(EN) · Tianyu Pang · 2026-06-09 15:59

Flow-DPPO：用于流匹配模型的散度近端策略优化

Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO…
arXiv cs.AI TIER_1 English(EN) · Boshu Lei, Kostas Daniilidis, Antonio Loquercio · 2026-06-09 04:00

基于密度传输的流匹配策略的强化学习

arXiv:2606.08602v1 Announce Type: cross Abstract: We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-based policy improvement as a transport of action densities towards re…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 00:00

Flow-DPPO：用于流匹配模型的散度近端策略优化

Flow-DPPO replaces ratio clipping with divergence proximal constraints in flow matching models, improving training stability and multi-objective optimization through exact KL divergence computation.
arXiv cs.AI TIER_1 English(EN) · Antonio Loquercio · 2026-06-07 12:28

基于密度传输的流匹配策略的强化学习

We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-based policy improvement as a transport of action densities towards regions of high reward, which naturally aligns with …
arXiv cs.CV TIER_1 English(EN) · An Zhao, Shengyuan Zhang, Zhongjian Sun, Yixiang Zhou, Zejian Li, Ling Yang, Tianrun Chen, Lingyun Sun · 2026-06-10 04:00

Mean Flow Distillation: 鲁棒且稳定的流匹配模型蒸馏方法

arXiv:2606.11155v1 Announce Type: new Abstract: Flow Matching models have demonstrated strong performance across a wide range of generative tasks. However, their reliance on ODE-based iterative sampling incurs substantial computational overhead in inference, which limits their ap…
arXiv cs.CV TIER_1 English(EN) · Lingyun Sun · 2026-06-09 17:39

Mean Flow Distillation：面向流匹配模型的鲁棒且稳定的蒸馏方法

Flow Matching models have demonstrated strong performance across a wide range of generative tasks. However, their reliance on ODE-based iterative sampling incurs substantial computational overhead in inference, which limits their applicability in real-time scenes. While distillat…

报道来源 [11]

相关实体

相关话题