English(EN) Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

机器人通过视觉和触觉反馈学习和改进策略 · 跟踪5个来源

作者 PulseAugur 编辑部 · [8 个来源] · 2026-06-16 04:00

研究人员开发了新的框架，通过推理时引导和自我改进来提高机器人策略性能。VERITAS是一个生成器-验证器框架，它使用预训练策略和视觉验证器在无需额外训练的情况下引导动作，实现了可与专家演示相媲美的性能提升。ViTaL通过在视觉数据之外整合触觉反馈来增强这一点，用于接触丰富的操作任务，显著提高了成功率。此外，Visual-OPSD和ViGOS探索了多模态大语言模型的按策略自蒸馏技术，解耦感知与推理，以改善基础行为并降低推理成本。 AI

影响这些进展可能导致机器人和多模态推理领域出现更具适应性和效率的AI系统，减少对人工干预和计算成本的依赖。

排序理由多篇arXiv论文详细介绍了AI和机器人领域的新研究框架。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 8 个来源。我们如何撰写摘要 →

报道来源 [8]

arXiv cs.LG TIER_1 English(EN) · Sihan Wang, Xiyao Liu, Lianqing Liu, Zhi Han · 2026-06-18 04:00

先感知后推理：解耦感知与推理以实现抗捷径的多模态按策略自蒸馏

arXiv:2606.19120v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to multim…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-17 11:59

Visual-OPSD：跨模态 on-policy 自蒸馏，实现高效统一的多模态推理

Unified multimodal models (UMMs) interleave generated ''visual thoughts'' (VTs) with text reasoning to improve spatial tasks. This incurs roughly an order-of-magnitude inference cost from multi-step diffusion. We find this cost yields limited direct benefit. On ThinkMorph, removi…
arXiv cs.AI TIER_1 English(EN) · Mingtong Zhang, Dhruv Shah · 2026-06-17 04:00

视觉验证实现推理时转向和自主策略改进

arXiv:2606.18247v1 Announce Type: cross Abstract: Robots deployed in the real world should learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose VERITAS, a generator-verifier framework for …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-17 00:00

先感知后推理：解耦感知与推理以实现抗捷径的多模态 on-policy 自蒸馏

ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts.
arXiv cs.AI TIER_1 English(EN) · Dhruv Shah · 2026-06-16 17:59

视觉验证实现推理时转向和自主策略改进

Robots deployed in the real world should learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose VERITAS, a generator-verifier framework for generalist robot policies for inference-time polic…
arXiv cs.AI TIER_1 English(EN) · Yilin Wu, Zilin Si, Zeynep Temel, Oliver Kroemer, Andrea Bajcsy · 2026-06-16 04:00

通过视觉和触觉进行推理时策略引导

arXiv:2606.14981v1 Announce Type: cross Abstract: Inference-time steering adapts pre-trained generative robot policies during deployment by verifying candidate actions before execution. While prior methods typically perform this verification only with visual observations, vision …
arXiv cs.CV TIER_1 English(EN) · Zhi Han · 2026-06-17 14:33

先感知后推理：解耦感知与推理以实现抗捷径的多模态按策略自蒸馏

On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to multimodal large language models (MLLMs) can create a …
arXiv cs.CV TIER_1 English(EN) · Jun Liu · 2026-06-17 11:59

Visual-OPSD：跨模态 on-policy 自蒸馏，实现高效统一的多模态推理

Unified multimodal models (UMMs) interleave generated ''visual thoughts'' (VTs) with text reasoning to improve spatial tasks. This incurs roughly an order-of-magnitude inference cost from multi-step diffusion. We find this cost yields limited direct benefit. On ThinkMorph, removi…

报道来源 [8]

相关实体

相关话题