Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 5d · [6 sources]

\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Researchers have developed new methods for reinforcement learning policies that aim to improve efficiency and expressiveness. One approach, Score-Based One-step MeanFlow Policy Optimization (SOM), constructs a target velocity field using Q-function scores and a probability flow ODE, enabling state-of-the-art performance in online RL with reduced training and inference times. Another development, Stochastic MeanFlow Policies (SMFP), offers a one-step generative policy class that maps noise to actions through a MeanFlow transformation, providing a unified objective for stable and exploratory policy improvement in off-policy settings. AI

IMPACT These new policy optimization techniques promise faster training and inference in reinforcement learning, potentially accelerating advancements in robotics and autonomous systems.

reinforcement learning
MuJoCo
Gaussian policies
Stochastic MeanFlow Policies
MeanFlow
Score-Based One-step MeanFlow Policy Optimization