新基准和模型提升了自动驾驶的视觉语言模型能力

作者 PulseAugur 编辑部 · [10 个来源] · 2026-05-21 00:00

研究人员正在开发新的基准和模型，以提高视觉语言模型（VLMs）在自动驾驶中的能力。Drive-P2D 和 DriveSpatial 是新的基准，分别用于评估 VLM 在渐进式感知到决策任务和时空推理方面的能力，突显了当前场景构建和推理的局限性。同时，Fast-dDrive、SparseWorld 和 SpaceDrive 提出了新颖的 VLM 架构和方法，例如块扩散和空间感知注入，通过更好地平衡感知、规划和实时部署需求，来提高自动驾驶系统的效率、准确性和安全性。 AI

影响这些进展旨在通过增强 VLM 的感知、推理和实时决策能力来提高自动驾驶系统的安全性和效率。

排序理由多篇研究论文介绍了用于自动驾驶 VLM 的新基准和模型。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 10 个来源。我们如何撰写摘要 →

报道来源 [10]

arXiv cs.AI TIER_1 English(EN) · Zecong Tang, Zixu Wang, Yifei Wang, Weitong Lian, Tianjian Gao, Haoran Li, Tengju Ru, Lingyi Meng, Zhejun Cui, Yichen Zhu, Qi Kang, Kaixuan Wang, Yu Zhang · 2026-05-27 04:00

Drive-P2D：面向自动驾驶视觉语言模型的渐进式感知到决策基准测试

arXiv:2601.14702v2 Announce Type: replace Abstract: Autonomous driving requires reliable perception and safe decision-making in complex scenarios. Recent vision-language models (VLMs) demonstrate reasoning and generalization abilities, opening new possibilities for autonomous dri…
arXiv cs.CL TIER_1 English(EN) · Kewei Zhang, Jin Wang, Sensen Gao, Chengyue Wu, Yulong Cao, Songyang Han, Boris Ivanovic, Langechuan Liu, Marco Pavone, Song Han, Daquan Zhou, Enze Xie · 2026-05-25 04:00

Fast-dDrive：高效块扩散视觉语言模型助力自动驾驶

arXiv:2605.23163v1 Announce Type: new Abstract: End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 00:00

Fast-dDrive：高效块扩散视觉语言模型助力自动驾驶

Fast-dDrive introduces a block-diffusion Vision-Language-Action model for autonomous driving that improves efficiency and accuracy through structured token freezing, section-aware training, and speculative decoding techniques.
arXiv cs.CL TIER_1 English(EN) · Enze Xie · 2026-05-22 02:31

Fast-dDrive：高效块扩散视觉语言模型助力自动驾驶

End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs are memory-bandwidth-bound on edge hardware and …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 00:00

为部分可观测环境下的自动驾驶学习统一风险地图

A unified risk map modeling framework addresses occlusion challenges in autonomous driving by integrating traffic flow and collision risks through spatiotemporal modeling and diffusion-based scenario generation.
arXiv cs.CV TIER_1 English(EN) · Kevin Richard, Alphin Varghese, Colin Pham, David Oh, Srijan Das · 2026-05-26 04:00

D2-V2X：面向自动驾驶的深度驱动协同V2X推理

arXiv:2605.24098v1 Announce Type: new Abstract: Single-vehicle Vision-Language Models (VLMs) are fundamentally constrained by sensor occlusions. While Vehicle-to-Everything (V2X) systems mitigate this, current benchmarks lack the cooperative reasoning required for resolving ambig…
arXiv cs.CV TIER_1 English(EN) · Ruoyu Wang, Jingke Wang, Yukai Ma, Yuehao Huang, Shuangming Lei, Guanglin Xu, Aixue Ye, Yong Liu · 2026-05-26 04:00

SparseWorld：通过具有稀疏场景表示的世界模型增强端到端自动驾驶

arXiv:2605.24354v1 Announce Type: new Abstract: Recently, world models have made significant progress in enhancing end-to-end driving systems through both future situation forecasting and improved scene understanding. However, existing driving world models are typically built upo…
arXiv cs.CV TIER_1 English(EN) · Florian Wintel, Sigmund H. H{\o}eg, Gabriel Kiss, Frank Lindseth · 2026-05-25 04:00

使用集成扩散模型估计端到端自动驾驶的不确定性

arXiv:2506.00560v2 Announce Type: replace-cross Abstract: End-to-end planning systems for autonomous driving are rapidly improving, especially in closed-loop simulation environments like CARLA. Many such driving systems either do not consider uncertainty as part of the plan itsel…
arXiv cs.CV TIER_1 English(EN) · Hao Vo, Khoa Vo, Phu Loc Nguyen, Sieu Tran, Duc Minh Nguyen, Ngo Xuan Cuong, Gladys Gawugah, Sreevenkata Anjani Tishita Godavarthi, Chase Rainwater, Nghi D. Q. Bui, Anh Nguyen, Duy Minh Ho Nguyen, Ngan Le · 2026-05-25 04:00

DRIVESPATIAL：用于自动驾驶的VLMs中的时空智能基准测试

arXiv:2605.23176v1 Announce Type: new Abstract: Spatiotemporal intelligence in autonomous driving (AD) requires an agent to integrate multi-view observations into a coherent scene representation, maintain object continuity across viewpoints and time, and reason about spatial rela…
arXiv cs.CV TIER_1 English(EN) · Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, Andreas Zell · 2026-05-22 04:00

SpaceDrive：为基于VLM的自动驾驶注入空间感知能力

arXiv:2512.10719v2 Announce Type: replace Abstract: End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities obtained from the large-scale pretrai…

报道来源 [10]

相关实体

相关话题