GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
Researchers have introduced GROW, a novel reinforcement learning framework designed to enhance the capabilities of vision-language model (VLM) agents in open-world tasks. Unlike previous methods that relied heavily on supervised fine-tuning, GROW adapts the Group Relative Policy Optimization (GRPO) algorithm by decomposing trajectories into state-action samples. This approach mitigates issues with long contexts and noise inherent in standard GRPO, enabling more effective multi-turn learning. Experiments on over 800 Minecraft tasks demonstrated that GROW achieves state-of-the-art performance, showcasing its potential for advancing VLM agents. AI
IMPACT Enhances VLM agent performance in open-world tasks by improving reinforcement learning efficiency.