New RL policies boost efficiency with one-step generative control

By PulseAugur Editorial · [6 sources] · 2026-05-20 15:14

Researchers have developed new methods for reinforcement learning policies that aim to improve efficiency and expressiveness. One approach, Score-Based One-step MeanFlow Policy Optimization (SOM), constructs a target velocity field using Q-function scores and a probability flow ODE, enabling state-of-the-art performance in online RL with reduced training and inference times. Another development, Stochastic MeanFlow Policies (SMFP), offers a one-step generative policy class that maps noise to actions through a MeanFlow transformation, providing a unified objective for stable and exploratory policy improvement in off-policy settings. AI

IMPACT These new policy optimization techniques promise faster training and inference in reinforcement learning, potentially accelerating advancements in robotics and autonomous systems.

RANK_REASON The cluster contains two academic papers detailing new methods in reinforcement learning.

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 6 sources. How we write summaries →

New RL policies boost efficiency with one-step generative control

COVERAGE [6]

arXiv cs.AI TIER_1 English(EN) · Kyungyoon Kim, Donghyeon Ki, Hee-Jun Ahn, Byung-Jun Lee · 2026-05-25 04:00

Score-Based One-step MeanFlow Policy Optimization

arXiv:2605.23365v1 Announce Type: cross Abstract: Diffusion and flow matching have emerged as expressive policy classes in reinforcement learning, but their reliance on multi-step denoising imposes substantial computational overhead at inference time, which is particularly proble…
arXiv cs.AI TIER_1 English(EN) · Byung-Jun Lee · 2026-05-22 08:28

Score-Based One-step MeanFlow Policy Optimization

Diffusion and flow matching have emerged as expressive policy classes in reinforcement learning, but their reliance on multi-step denoising imposes substantial computational overhead at inference time, which is particularly problematic in online RL. MeanFlow offers a promising al…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-22 08:28

Score-Based One-step MeanFlow Policy Optimization

Diffusion and flow matching have emerged as expressive policy classes in reinforcement learning, but their reliance on multi-step denoising imposes substantial computational overhead at inference time, which is particularly problematic in online RL. MeanFlow offers a promising al…
arXiv cs.AI TIER_1 English(EN) · Zeyuan Wang, Da Li, Yulin Chen, Yuehu Gong, Yanming Guo, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei Fu · 2026-05-22 04:00

Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

arXiv:2605.21282v2 Announce Type: cross Abstract: Online off-policy reinforcement learning (RL) is shaped by two coupled choices: the policy class and the update rule. Gaussian policies are fast and have tractable entropy, but struggle with multimodal action distributions. Genera…
arXiv cs.AI TIER_1 English(EN) · Yanwei Fu · 2026-05-20 15:14

\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Online off-policy reinforcement learning (RL) is shaped by two coupled choices: the policy class and the update rule. Gaussian policies are fast and have tractable entropy, but struggle with multimodal action distributions. Generative policies are more expressive, but often requi…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 15:14

\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Online off-policy reinforcement learning (RL) is shaped by two coupled choices: the policy class and the update rule. Gaussian policies are fast and have tractable entropy, but struggle with multimodal action distributions. Generative policies are more expressive, but often requi…

COVERAGE [6]

Score-Based One-step MeanFlow Policy Optimization

Score-Based One-step MeanFlow Policy Optimization

Score-Based One-step MeanFlow Policy Optimization

Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

RELATED ENTITIES

RELATED TOPICS