Researchers have introduced OmniDrive-R1, a novel framework for autonomous driving that integrates perception and reasoning using an interleaved Multi-modal Chain-of-Thought (iMCoT) mechanism. This approach addresses object hallucination issues common in Vision-Language Models by employing a reinforcement-driven visual grounding capability. The system utilizes a unique annotation-free training pipeline with the Clip-GRPO algorithm, which generates a grounding reward without requiring dense localization labels. Experiments show OmniDrive-R1 significantly boosts reasoning scores and accuracy compared to baseline models. AI
影响 Introduces a novel approach to improve VLM reliability in safety-critical autonomous driving applications.
排序理由 This is a research paper detailing a new model and methodology for autonomous driving.
- Clip-GRPO
- DriveLMM-o1
- OmniDrive-R1
- Qwen2.5VL-7B
- Vision-Language Models
- Zhenguo Zhang
- Chain-of-Thought
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →