PulseAugur
实时 22:17:30

新的视觉语言模型提升自动驾驶效率和空间推理能力

研究人员正在为自动驾驶开发先进的视觉语言模型(VLMs),重点是提高效率和空间推理能力。Fast-dDrive 等新方法旨在平衡高保真轨迹规划与更快的推理速度,在关键基准测试中表现优于现有模型。其他方法,如 SpaceDrive,通过将 3D 坐标视为位置编码而非文本标记来显式注入空间意识,从而提高规划精度。此外,还引入了一个名为 DriveSpatial 的新基准来评估自动驾驶中 VLMs 的时空智能,揭示了当前模型与人类在复杂场景构建方面的表现存在显著差距。 AI

影响 自动驾驶领域 VLMs 的进步有望带来更高效、更具空间意识的系统,尽管当前模型在复杂推理方面仍落后于人类表现。

排序理由 多篇研究论文介绍了用于自动驾驶 VLMs 的新模型、基准和技术。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.CL TIER_1 English(EN) · Kewei Zhang, Jin Wang, Sensen Gao, Chengyue Wu, Yulong Cao, Songyang Han, Boris Ivanovic, Langechuan Liu, Marco Pavone, Song Han, Daquan Zhou, Enze Xie ·

    Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

    arXiv:2605.23163v1 Announce Type: new Abstract: End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs …

  2. arXiv cs.CL TIER_1 English(EN) · Enze Xie ·

    Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

    End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs are memory-bandwidth-bound on edge hardware and …

  3. arXiv cs.CV TIER_1 English(EN) · Hao Vo, Khoa Vo, Phu Loc Nguyen, Sieu Tran, Duc Minh Nguyen, Ngo Xuan Cuong, Gladys Gawugah, Sreevenkata Anjani Tishita Godavarthi, Chase Rainwater, Nghi D. Q. Bui, Anh Nguyen, Duy Minh Ho Nguyen, Ngan Le ·

    DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

    arXiv:2605.23176v1 Announce Type: new Abstract: Spatiotemporal intelligence in autonomous driving (AD) requires an agent to integrate multi-view observations into a coherent scene representation, maintain object continuity across viewpoints and time, and reason about spatial rela…

  4. arXiv cs.CV TIER_1 English(EN) · Florian Wintel, Sigmund H. H{\o}eg, Gabriel Kiss, Frank Lindseth ·

    Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving

    arXiv:2506.00560v2 Announce Type: replace-cross Abstract: End-to-end planning systems for autonomous driving are rapidly improving, especially in closed-loop simulation environments like CARLA. Many such driving systems either do not consider uncertainty as part of the plan itsel…

  5. arXiv cs.CV TIER_1 English(EN) · Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, Andreas Zell ·

    SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

    arXiv:2512.10719v2 Announce Type: replace Abstract: End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities obtained from the large-scale pretrai…