PulseAugur
实时 14:47:15

新的诊断方法表明,视觉编码器的选择取决于VLA骨干网的规模

一种名为“冻结骨干嫁接”的新诊断方法已被开发出来,用于评估视觉-语言-动作(VLA)策略中的视觉编码器。该方法测试在较小VLA骨干网上表现良好的编码器在较大的骨干网上是否也表现良好。在不同编码器、VLA套件和骨干网(SmolVLA-450M和$\pi_{0.5}$-3.3B)上的实验表明,最佳编码器的选择通常取决于骨干网的规模和特定的任务套件,这表明小规模骨干网的验证并不能可靠地预测大规模骨干网的性能。研究人员提出将此诊断方法作为在扩展规模之前选择编码器的成本效益工具。 AI

影响 强调了在VLA策略中需要根据骨干网具体选择编码器,并表明当前的小规模验证可能无法推广到更大的模型。

排序理由 该集群包含一篇研究论文,详细介绍了一种用于评估VLA策略中视觉编码器的新诊断方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Qingping Zeng, Fei She ·

    Encoder Winners Do Not Reliably Transfer Across VLA Backbone Scale: A Frozen-Backbone Grafting Diagnostic

    arXiv:2606.14153v1 Announce Type: new Abstract: Vision-language-action (VLA) policies typically inherit their vision encoder from upstream VLM releases, but it is unclear whether an encoder choice validated on a small VLA transfers to a larger backbone. We introduce a frozen-back…

  2. arXiv cs.CV TIER_1 English(EN) · Fei She ·

    Encoder Winners Do Not Reliably Transfer Across VLA Backbone Scale: A Frozen-Backbone Grafting Diagnostic

    Vision-language-action (VLA) policies typically inherit their vision encoder from upstream VLM releases, but it is unclear whether an encoder choice validated on a small VLA transfers to a larger backbone. We introduce a frozen-backbone grafting diagnostic: the vision tower of a …