PulseAugur
EN
LIVE 14:55:03

New Research Rethinks VLM Initialization for Action Models

A new paper explores how to best initialize Vision-Language-Action (VLA) models by examining the impact of pretrained Vision-Language Model (VLM) representations. The research indicates that preserving the original VLM representation is crucial for action performance, while full finetuning can be detrimental. Techniques like LoRA and staged robot-data pretraining show promise for improving VLA initialization by injecting action-relevant signals without overly altering the core VLM. AI

IMPACT Preserving core VLM representations and using methods like LoRA can improve action model performance.

RANK_REASON The cluster contains an academic paper detailing research findings on model initialization techniques.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New Research Rethinks VLM Initialization for Action Models

COVERAGE [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Rethinking VLM Representation for VLA Initialization

    Effective vision-language-action model initialization requires balancing pretrained vision-language model representations with embodied task-specific adaptations and robot-data pretraining while preserving core action-relevant features.

  2. arXiv cs.CV TIER_1 English(EN) · Weifeng Lin, Siyuan Huang, Hao Li, Tingwei Chen, Ruichuan An, Xinyu Wei, Jianbo Liu, Hongsheng Li ·

    Rethinking VLM Representation for VLA Initialization

    arXiv:2605.25802v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models widely adopt pretrained Vision-Language Models (VLMs) as policy backbones, yet it remains unclear what kind of pretrained VLM representation is useful as a VLA initialization. In this paper, we st…

  3. arXiv cs.CV TIER_1 English(EN) · Hongsheng Li ·

    Rethinking VLM Representation for VLA Initialization

    Vision-Language-Action (VLA) models widely adopt pretrained Vision-Language Models (VLMs) as policy backbones, yet it remains unclear what kind of pretrained VLM representation is useful as a VLA initialization. In this paper, we study VLA initialization as a controlled represent…