Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Researchers have introduced Hierarchical Advantage-Weighted Behavior Cloning (HABC) to improve online reinforcement learning for vision-language agents (VLAs). HABC addresses the challenge of sparse, binary outcomes in RL fine-tuning by separating viability and efficiency objectives and using a state-adaptive gate to balance them. This method also incorporates intervention-aware credit assignment to prevent incorrect learning from segments executed by external policies. Experiments on real-world robotic tasks demonstrated significant improvements in success rates compared to standard supervised fine-tuning baselines. AI
IMPACT This research offers a novel approach to improve the efficiency and effectiveness of reinforcement learning for complex robotic tasks.