Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 18h · [2 sources]

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Researchers have introduced Hierarchical Advantage-Weighted Behavior Cloning (HABC) to improve online reinforcement learning for vision-language agents (VLAs). HABC addresses the challenge of sparse, binary outcomes in RL fine-tuning by separating viability and efficiency objectives and using a state-adaptive gate to balance them. This method also incorporates intervention-aware credit assignment to prevent incorrect learning from segments executed by external policies. Experiments on real-world robotic tasks demonstrated significant improvements in success rates compared to standard supervised fine-tuning baselines. AI

IMPACT This research offers a novel approach to improve the efficiency and effectiveness of reinforcement learning for complex robotic tasks.

reinforcement learning
supervised fine-tuning
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Hierarchical Advantage-Weighted Behavior Cloning