PulseAugur
实时 17:58:16
English(EN) Rethinking VLM Representation for VLA Initialization

新研究重新思考用于动作模型的VLM初始化

一篇新论文通过检查预训练视觉语言模型(VLM)表示的影响,探讨了如何最好地初始化视觉语言动作(VLA)模型。研究表明,保留原始VLM表示对于动作性能至关重要,而完全微调可能会适得其反。LoRA和分阶段机器人数据预训练等技术通过注入与动作相关的信号而不过度改变核心VLM,有望改善VLA初始化。 AI

影响 保留核心VLM表示并使用LoRA等方法可以提高动作模型的性能。

排序理由 该集群包含一篇详细介绍模型初始化技术研究结果的学术论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新研究重新思考用于动作模型的VLM初始化

报道来源 [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    重新思考VLM表示法以用于VLA初始化

    Effective vision-language-action model initialization requires balancing pretrained vision-language model representations with embodied task-specific adaptations and robot-data pretraining while preserving core action-relevant features.

  2. arXiv cs.CV TIER_1 English(EN) · Weifeng Lin, Siyuan Huang, Hao Li, Tingwei Chen, Ruichuan An, Xinyu Wei, Jianbo Liu, Hongsheng Li ·

    重新思考VLM表示用于VLA初始化

    arXiv:2605.25802v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models widely adopt pretrained Vision-Language Models (VLMs) as policy backbones, yet it remains unclear what kind of pretrained VLM representation is useful as a VLA initialization. In this paper, we st…

  3. arXiv cs.CV TIER_1 English(EN) · Hongsheng Li ·

    重新思考VLM表示用于VLA初始化

    Vision-Language-Action (VLA) models widely adopt pretrained Vision-Language Models (VLMs) as policy backbones, yet it remains unclear what kind of pretrained VLM representation is useful as a VLA initialization. In this paper, we study VLA initialization as a controlled represent…