OmniDrive-R1 enhances autonomous driving VLMs with reinforcement-driven visual grounding

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-01 04:00

Researchers have introduced OmniDrive-R1, a novel framework for autonomous driving that integrates perception and reasoning using an interleaved Multi-modal Chain-of-Thought (iMCoT) mechanism. This approach addresses object hallucination issues common in Vision-Language Models by employing a reinforcement-driven visual grounding capability. The system utilizes a unique annotation-free training pipeline with the Clip-GRPO algorithm, which generates a grounding reward without requiring dense localization labels. Experiments show OmniDrive-R1 significantly boosts reasoning scores and accuracy compared to baseline models. AI

影响 Introduces a novel approach to improve VLM reliability in safety-critical autonomous driving applications.

排序理由 This is a research paper detailing a new model and methodology for autonomous driving.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Zhenguo Zhang, Haohan Zheng, Yishen Wang, Le Xu, Tianchen Deng, Xuefeng Chen, Qu Chen, Bo Zhang, Wuxiong Huang · 2026-05-01 04:00

OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving

arXiv:2512.14044v3 Announce Type: replace-cross Abstract: The deployment of Vision-Language Models (VLMs) in safety-critical domains like autonomous driving (AD) is critically hindered by reliability failures, most notably object hallucination. This failure stems from their relia…

报道来源 [1]

OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving

相关实体

相关话题