English(EN)ROAD-VLA: Robust Online Adaptation via Self-Distillation for Vision-Language-Action Models
新方法提升机器人VLA模型效率与性能 · 追踪9个来源
作者PulseAugur 编辑部·[11 个来源]·
研究人员正在开发新方法,以提高机器人领域中视觉-语言-动作(VLA)模型的效率和性能。一种方法,Flow Policy Optimization (FPO),使用强化学习来微调VLA模型,通过一种增强梯度效率和稳定性的新算法来克服计算挑战。另一种方法,VLM-PBRS,利用视觉-语言模型学习奖励塑造的潜在函数,这在没有专家设计的奖励项的情况下保留了最优策略并加速了学习。此外,ROAD-VLA采用自蒸馏来鲁棒地适应VLA模型,在分布变化的情况下,其在机器人操作任务中的表现优于标准方法。PolicyTrim通过扩展可靠的动作块长度和减少冗余的物理步骤来关注内在策略效率,从而显著加快部署速度。最后,EventVLA引入了一个稀疏视觉证据记忆框架来解决长时程操作挑战,提高了复杂任务的成功率。
AI
arXiv:2510.09976v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models such as OpenVLA, Octo, and $\pi_0$ have shown strong generalization by leveraging large-scale demonstrations, yet their performance is still fundamentally constrained by the quality and covera…
arXiv cs.AI
TIER_1English(EN)·Henrik M\"uller, Daniel Kudenko·
arXiv:2606.27180v1 Announce Type: cross Abstract: Sparse rewards are inherently challenging for reinforcement learning agents as they lack intermediate feedback to guide exploration and to correctly attribute the sparse success rewards to relevant parts of the trajectory. Naive r…
Sparse rewards are inherently challenging for reinforcement learning agents as they lack intermediate feedback to guide exploration and to correctly attribute the sparse success rewards to relevant parts of the trajectory. Naive reward shaping can induce reward hacking, yielding …
arXiv cs.LG
TIER_1English(EN)·Kejing Wang, Toan Nguyen, Minh Hoang Nguyen, Simon Khan, Flora D. Salim·
arXiv:2606.25800v1 Announce Type: new Abstract: Effective online adaptation of vision-language-action (VLA) models remains challenging, as sparse rewards provide weak supervision for high-dimensional autoregressive action policies. Although self-distillation can in principle prov…
Effective online adaptation of vision-language-action (VLA) models remains challenging, as sparse rewards provide weak supervision for high-dimensional autoregressive action policies. Although self-distillation can in principle provide denser training signals, we find that text-b…
PolicyTrim is a reinforcement learning-based framework that enhances VLA model efficiency by extending reliable action chunk lengths and reducing redundant physical steps through dynamic exploration and redundancy-aware rewards.
arXiv:2606.26801v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have shown strong potential for generalizable robotic manipulation. During fine-tuning, however, action supervision applies equally across all timesteps, without structured supervision on which …
Vision-Language-Action (VLA) models have shown strong potential for generalizable robotic manipulation. During fine-tuning, however, action supervision applies equally across all timesteps, without structured supervision on which manipulation stage the robot is in or what the nex…
arXiv:2606.22540v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centri…
arXiv:2605.11567v3 Announce Type: replace Abstract: Vision-Language-Action (VLA) models predominantly adopt action chunking, i.e., predicting and committing to a short horizon of consecutive low-level actions in a single forward pass, to amortize the inference cost of large-scale…