Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications
Researchers have developed a new method called DIBS, which decouples behavioral cloning from reinforcement learning to improve inductive generalization. This approach separates the learning of task-specific policies from the learning of a higher-order policy-evolution function. By fitting the evolution function through behavioral cloning on state-action pairs from teacher policies, DIBS replaces noisy reward aggregation with stable supervision, leading to better training stability and zero-shot generalization compared to existing algorithms. AI
IMPACT Enhances reinforcement learning generalization and training stability for complex tasks.