PulseAugur
EN
LIVE 13:13:09

X Square Robot releases 4B VLA model with open code, real-robot tests

X Square Robot has released Wall-OSS-0.5, a 4 billion parameter vision-language-action (VLA) model. The model is built upon a 3 billion parameter vision-language model backbone and incorporates action experts using a Mixture-of-Transformers architecture. Notably, the research evaluates the model's performance on real robots before fine-tuning, demonstrating strong zero-shot capabilities and significant improvements after task-specific adaptation. AI

IMPACT This release provides open-source code and a model for vision-language-action tasks, potentially accelerating research and development in embodied AI and robotics.

RANK_REASON This is a release of an open-source model with accompanying research paper and code, detailing novel methods and evaluations. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/MachineLearning →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/MachineLearning TIER_1 English(EN) · /u/Tall-Peak2618 ·

    Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

    <!-- SC_OFF --><div class="md"><p>Wall-OSS-0.5 is a new 4B VLA release from X Square Robot, built on a 3B VLM backbone with action experts in a Mixture-of-Transformers layout. What caught my eye is that the report evaluates the pretrained checkpoint on real robots before task-spe…