Brief · PulseAugur

TOOL · r/MachineLearning English(EN) · 2w

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

X Square Robot has released Wall-OSS-0.5, a 4 billion parameter vision-language-action (VLA) model. The model is built upon a 3 billion parameter vision-language model backbone and incorporates action experts using a Mixture-of-Transformers architecture. Notably, the research evaluates the model's performance on real robots before fine-tuning, demonstrating strong zero-shot capabilities and significant improvements after task-specific adaptation. AI

IMPACT This release provides open-source code and a model for vision-language-action tasks, potentially accelerating research and development in embodied AI and robotics.

Mixture-of-Transformers
X Square Robot
Wall-OSS-0.5
Vision-Aligned RVQ
DMuon