PulseAugur
实时 03:14:45
English(EN) UniDrive: A Unified Vision-Language and Grounding Framework for Interpretable Risk Understanding in Autonomous Driving

UniDrive框架统一视觉-语言和接地技术,用于自动驾驶风险理解 · 跟踪3个来源

研究人员推出UniDrive,一个旨在通过统一视觉-语言和接地能力来增强自动驾驶系统风险理解的新型框架。该方法解决了现有模型在平衡时间推理与空间精度方面常遇到的局限性。UniDrive集成了时间推理分支和高分辨率感知分支,使用门控交叉注意力融合模块将动态上下文与详细空间证据对齐。该框架生成自然语言风险描述和已识别危险的接地边界框,在DRAMA-Reasoning等基准测试中表现优异,并有望提高安全关键型自动驾驶系统的可解释性和可信度。 AI

影响 通过结合时间和空间数据处理,增强了自动驾驶系统的可解释性和可信度。

排序理由 该集群描述了一篇关于自动驾驶新框架的研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

UniDrive框架统一视觉-语言和接地技术,用于自动驾驶风险理解 · 跟踪3个来源

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Xiaowei Gao, Pengxiang Li, Yitai Cheng, Ruihan Xu, James Haworth, Stephen Law, Yun Ye ·

    UniDrive: A Unified Vision-Language and Grounding Framework for Interpretable Risk Understanding in Autonomous Driving

    arXiv:2606.24759v1 Announce Type: cross Abstract: Recent multimodal large language models (MLLMs) have shown strong potential for autonomous driving scene understanding, yet existing methods still face a fundamental trade-off between temporal reasoning and spatial precision. Mode…

  2. arXiv cs.AI TIER_1 English(EN) · Yun Ye ·

    UniDrive: A Unified Vision-Language and Grounding Framework for Interpretable Risk Understanding in Autonomous Driving

    Recent multimodal large language models (MLLMs) have shown strong potential for autonomous driving scene understanding, yet existing methods still face a fundamental trade-off between temporal reasoning and spatial precision. Models that rely on single-frame or low-resolution inp…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    UniDrive: A Unified Vision-Language and Grounding Framework for Interpretable Risk Understanding in Autonomous Driving

    Recent multimodal large language models (MLLMs) have shown strong potential for autonomous driving scene understanding, yet existing methods still face a fundamental trade-off between temporal reasoning and spatial precision. Models that rely on single-frame or low-resolution inp…