PulseAugur
实时 19:42:56

New VLMs boost autonomous driving efficiency and spatial reasoning

Researchers are developing advanced Vision-Language Models (VLMs) for autonomous driving, focusing on improving efficiency and spatial reasoning. New methods like Fast-dDrive aim to balance high-fidelity trajectory planning with faster inference, outperforming existing models on key benchmarks. Other approaches, such as SpaceDrive, explicitly infuse spatial awareness by treating 3D coordinates as positional encodings rather than text tokens, enhancing planning accuracy. Additionally, a new benchmark called DriveSpatial has been introduced to evaluate the spatiotemporal intelligence of VLMs in autonomous driving, revealing a significant gap between current models and human performance, particularly in scene construction. AI

影响 Advances in VLMs for autonomous driving promise more efficient and spatially aware systems, though current models still lag human performance in complex reasoning.

排序理由 Multiple research papers introducing new models, benchmarks, and techniques for autonomous driving VLMs.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.CL TIER_1 · Kewei Zhang, Jin Wang, Sensen Gao, Chengyue Wu, Yulong Cao, Songyang Han, Boris Ivanovic, Langechuan Liu, Marco Pavone, Song Han, Daquan Zhou, Enze Xie ·

    Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

    arXiv:2605.23163v1 Announce Type: new Abstract: End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs …

  2. arXiv cs.CL TIER_1 · Enze Xie ·

    Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

    End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs are memory-bandwidth-bound on edge hardware and …

  3. arXiv cs.CV TIER_1 · Hao Vo, Khoa Vo, Phu Loc Nguyen, Sieu Tran, Duc Minh Nguyen, Ngo Xuan Cuong, Gladys Gawugah, Sreevenkata Anjani Tishita Godavarthi, Chase Rainwater, Nghi D. Q. Bui, Anh Nguyen, Duy Minh Ho Nguyen, Ngan Le ·

    DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

    arXiv:2605.23176v1 Announce Type: new Abstract: Spatiotemporal intelligence in autonomous driving (AD) requires an agent to integrate multi-view observations into a coherent scene representation, maintain object continuity across viewpoints and time, and reason about spatial rela…

  4. arXiv cs.CV TIER_1 · Florian Wintel, Sigmund H. H{\o}eg, Gabriel Kiss, Frank Lindseth ·

    Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving

    arXiv:2506.00560v2 Announce Type: replace-cross Abstract: End-to-end planning systems for autonomous driving are rapidly improving, especially in closed-loop simulation environments like CARLA. Many such driving systems either do not consider uncertainty as part of the plan itsel…

  5. arXiv cs.CV TIER_1 · Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, Andreas Zell ·

    SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

    arXiv:2512.10719v2 Announce Type: replace Abstract: End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities obtained from the large-scale pretrai…