Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]
X Square Robot has released Wall-OSS-0.5, a 4 billion parameter vision-language-action (VLA) model. The model is built upon a 3 billion parameter vision-language model backbone and incorporates action experts using a Mixture-of-Transformers architecture. Notably, the research evaluates the model's performance on real robots before fine-tuning, demonstrating strong zero-shot capabilities and significant improvements after task-specific adaptation. AI
IMPACT This release provides open-source code and a model for vision-language-action tasks, potentially accelerating research and development in embodied AI and robotics.