Trust-Region Diffusion Policies for Massively Parallel On-Policy RL
Researchers have introduced Trust-region Diffusion Policies (TruDi), a novel framework designed to enable the effective training of diffusion policies within massively parallel, on-policy reinforcement learning (RL) settings. This approach addresses the challenges of rapidly changing data distributions in on-policy RL by incorporating a trust-region optimization rule to maintain stability with complex policies. Empirical evaluations across four benchmarks and 73 tasks demonstrate that TruDi matches or surpasses existing baselines, showing particular strength in complex humanoid control tasks. AI
IMPACT Enables more expressive and stable policy training in massively parallel RL environments, potentially accelerating progress in complex control tasks.