Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 11h

Generative OOD-regularized Model-based Policy Optimization

Researchers have developed a new offline reinforcement learning algorithm called Generative OOD-regularized Model-based Policy Optimization (GORMPO). This method integrates generative models to explicitly model density in sparse state-action spaces, aiming to prevent policies from taking out-of-distribution actions. GORMPO restricts policy updates to high-density areas of the dataset and has shown a 17% performance improvement on a real-world medical dataset compared to existing baselines. AI

IMPACT Introduces a novel method for safer offline reinforcement learning by leveraging generative models to avoid out-of-distribution actions.

reinforcement learning
generative models
offline RL
GORMPO
Generative OOD-regularized Model-based Policy Optimization