Reinforcing VLAs in Task-Agnostic World Models
Researchers have introduced RAW-Dream, a new paradigm for adapting Vision-Language-Action (VLA) models without task-specific data. This approach leverages a pre-trained, task-agnostic world model for predicting future trajectories and an off-the-shelf Vision-Language Model (VLM) for reward generation. By disentangling world model learning from downstream tasks, RAW-Dream enables zero-shot adaptation for VLAs, with experiments showing performance gains in both simulated and real-world scenarios. AI
IMPACT Enables more scalable adaptation of VLA models to new tasks by removing the need for task-specific data.