English(EN) ROAD-VLA: Robust Online Adaptation via Self-Distillation for Vision-Language-Action Models

新方法提升机器人VLA模型效率与性能 · 追踪9个来源

作者 PulseAugur 编辑部 · [11 个来源] · 2026-06-18 00:00

研究人员正在开发新方法，以提高机器人领域中视觉-语言-动作（VLA）模型的效率和性能。一种方法，Flow Policy Optimization (FPO)，使用强化学习来微调VLA模型，通过一种增强梯度效率和稳定性的新算法来克服计算挑战。另一种方法，VLM-PBRS，利用视觉-语言模型学习奖励塑造的潜在函数，这在没有专家设计的奖励项的情况下保留了最优策略并加速了学习。此外，ROAD-VLA采用自蒸馏来鲁棒地适应VLA模型，在分布变化的情况下，其在机器人操作任务中的表现优于标准方法。PolicyTrim通过扩展可靠的动作块长度和减少冗余的物理步骤来关注内在策略效率，从而显著加快部署速度。最后，EventVLA引入了一个稀疏视觉证据记忆框架来解决长时程操作挑战，提高了复杂任务的成功率。 AI

影响 VLA模型的这些进步可能带来更强大、更高效的机器人来执行复杂的操作任务。

排序理由多篇研究论文介绍了改进视觉-语言-动作模型的新方法和框架。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 11 个来源。我们如何撰写摘要 →

报道来源 [11]

arXiv cs.LG TIER_1 English(EN) · Mingyang Lyu, Yinqian Sun, Erliang Lin, Huangrui Li, Ruolin Chen, Feifei Zhao, Yi Zeng · 2026-06-26 04:00

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

arXiv:2510.09976v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models such as OpenVLA, Octo, and $\pi_0$ have shown strong generalization by leveraging large-scale demonstrations, yet their performance is still fundamentally constrained by the quality and covera…
arXiv cs.AI TIER_1 English(EN) · Henrik M\"uller, Daniel Kudenko · 2026-06-26 04:00

Automating Potential-based Reward Shaping with Vision Language Model Guidance

arXiv:2606.27180v1 Announce Type: cross Abstract: Sparse rewards are inherently challenging for reinforcement learning agents as they lack intermediate feedback to guide exploration and to correctly attribute the sparse success rewards to relevant parts of the trajectory. Naive r…
arXiv cs.LG TIER_1 English(EN) · Daniel Kudenko · 2026-06-25 15:45

利用视觉语言模型指导实现基于潜在奖励塑形自动化

Sparse rewards are inherently challenging for reinforcement learning agents as they lack intermediate feedback to guide exploration and to correctly attribute the sparse success rewards to relevant parts of the trajectory. Naive reward shaping can induce reward hacking, yielding …
arXiv cs.LG TIER_1 English(EN) · Kejing Wang, Toan Nguyen, Minh Hoang Nguyen, Simon Khan, Flora D. Salim · 2026-06-25 04:00

ROAD-VLA: Robust Online Adaptation via Self-Distillation for Vision-Language-Action Models

arXiv:2606.25800v1 Announce Type: new Abstract: Effective online adaptation of vision-language-action (VLA) models remains challenging, as sparse rewards provide weak supervision for high-dimensional autoregressive action policies. Although self-distillation can in principle prov…
arXiv cs.LG TIER_1 English(EN) · Flora D. Salim · 2026-06-24 13:17

ROAD-VLA: Robust Online Adaptation via Self-Distillation for Vision-Language-Action Models

Effective online adaptation of vision-language-action (VLA) models remains challenging, as sparse rewards provide weak supervision for high-dimensional autoregressive action policies. Although self-distillation can in principle provide denser training signals, we find that text-b…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-21 00:00

PolicyTrim：提升视觉-语言-动作模型的内在策略效率

PolicyTrim is a reinforcement learning-based framework that enhances VLA model efficiency by extending reliable action chunk lengths and reducing redundant physical steps through dynamic exploration and redundancy-aware rewards.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-18 00:00

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

EventVLA addresses long-horizon robotic manipulation challenges by introducing a sparse visual evidence memory framework with visual anchors and dynamic Keyframe Evidence Memory module for improved task performance.
arXiv cs.CV TIER_1 English(EN) · Yuan Xu, Yixiang Chen, Kai Wang, Jiabing Yang, Peiyan Li, Qisen Ma, Yan Huang, Liang Wang · 2026-06-26 04:00

Improving Vision-Language-Action Model Fine-Tuning with Structured Stage and Keyframe Supervision

arXiv:2606.26801v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have shown strong potential for generalizable robotic manipulation. During fine-tuning, however, action supervision applies equally across all timesteps, without structured supervision on which …
arXiv cs.CV TIER_1 English(EN) · Liang Wang · 2026-06-25 09:38

通过结构化阶段和关键帧监督改进视觉-语言-动作模型微调

Vision-Language-Action (VLA) models have shown strong potential for generalizable robotic manipulation. During fine-tuning, however, action supervision applies equally across all timesteps, without structured supervision on which manipulation stage the robot is in or what the nex…
arXiv cs.CV TIER_1 English(EN) · Xianghui Wang, Feng Chen, Wenbo Zhang, Hua Yan, Zixuan Wang, Changsheng Li, Yinjie Lei · 2026-06-25 04:00

PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models

arXiv:2606.22540v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centri…
arXiv cs.CV TIER_1 English(EN) · Feng Chen, Xianghui Wang, Yuxuan Chen, Boying Li, Yefei He, Zeyu Zhang, Yicheng Wu · 2026-06-24 04:00

Dynamic Execution Commitment of Vision-Language-Action Models

arXiv:2605.11567v3 Announce Type: replace Abstract: Vision-Language-Action (VLA) models predominantly adopt action chunking, i.e., predicting and committing to a short horizon of consecutive low-level actions in a single forward pass, to amortize the inference cost of large-scale…

报道来源 [11]

相关实体

相关话题