English(EN) X-Foresight: A Joint Vision-Action Causal Forecasting Network via Predictive World Modeling

新的X-Foresight模型通过预测世界模型增强VLA系统

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-26 04:00

研究人员开发了X-Foresight，这是一种新颖的预测世界模型，已集成到视觉-语言-动作（VLA）模型中。该模型旨在通过预测未来的视频序列来为VLA系统配备物理世界知识，以应对平凡外推和长期因果关系等挑战。X-Foresight采用分块自回归策略和时间重要性采样来更有效地学习世界动力学和因果关系，在规划任务中表现优于现有的VLA基线。 AI

影响通过物理世界知识增强VLA模型，可能改进自主系统规划和泛化能力。

排序理由该集群包含一篇详细介绍新模型和方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Baolu Li (Victor), Jingyu Qian (Victor), Rui Guo (Victor), Yilun Chen (Victor), Hanpeng Liu (Victor), Yuan Lin (Victor), Junhong Zhou (Victor), Ruixin Liu (Victor), Willow Yang (Victor), Yutong Zheng (Victor), Zhenli Zhang (Victor), Tenglong (Victor), … · 2026-05-26 04:00

X-Foresight：通过预测性世界模型实现的联合视觉-动作因果预测网络

arXiv:2605.24892v1 Announce Type: new Abstract: Physical world knowledge resides mainly in videos. Equipping Vision-Language-Action (VLA) models with such knowledge is fundamental for safe and generalizable planning. Predictive world modeling enables VLA to internalize physical d…

报道来源 [1]

X-Foresight：通过预测性世界模型实现的联合视觉-动作因果预测网络

相关实体

相关话题