English(EN) X-Mind: Efficient Visual Chain-of-Thought via Predictive World Model for End-to-End Driving

X-Mind 框架整合预测世界模型以实现高效的端到端驾驶

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

研究人员推出 X-Mind，一个旨在通过整合预测世界模型来增强 Vision-Language-Action (VLA) 模型端到端驾驶能力的新框架。与先前将这些模型视为外部或浅层添加物的方法不同，X-Mind 将其内化为视觉思维链 (Visual CoT)，迫使模型在采取行动前推理未来的环境动态。为了解决效率问题，X-Mind 采用紧凑的视觉思维表示，将 12 帧的未来预测压缩到仅 96 个 token，并利用循环块扩散方案在单次前向传播中加速生成。这种方法使资源受限的车辆平台能够部署大规模认知推理，以实现稳健且低延迟的自动驾驶。 AI

影响该框架通过将前瞻性推理整合到资源受限的平台中，有可能实现更稳健、更高效的自动驾驶系统。

排序理由该集群描述了一篇关于自动驾驶新 AI 框架的最新研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Bohao Zhao, Chengrui Wei, Guangfeng Jiang, Ruixin Liu, Xuejie Lv, Liu Liang, Sutao Deng, Xiuyang Fan, Pengkun Zheng, Jinyun Zhou, Rui Guo, Hanpeng Liu, Yutong Zheng, Yi Guo, Xinlong Zheng, Qingyu Luo, Zhuangzhuang Ding, Yu Zhang, Hang Zhang, Xianming Liu · 2026-06-30 04:00

X-Mind: Efficient Visual Chain-of-Thought via Predictive World Model for End-to-End Driving

arXiv:2606.28758v1 Announce Type: cross Abstract: Predicting future states is essential for autonomous agents, yet current Vision-Language-Action (VLA) models fundamentally lack this capability, relying instead on reactive perception-action mapping. While integrating Predictive W…

报道来源 [1]

X-Mind: Efficient Visual Chain-of-Thought via Predictive World Model for End-to-End Driving

相关实体

相关话题