PulseAugur
实时 06:09:48
English(EN) Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

VLMs 从自我中心视频预测行人意图

研究人员开发了一种使用自我中心视觉和视觉语言模型(VLMs)预测行人过马路意图的新方法。通过将任务构建为视觉问答,他们对 VLMs 进行了微调,使其性能显著优于现有的基于 Transformer 的模型。包含的眼动和自我运动等上下文线索进一步提高了预测准确性,为这一安全关键应用树立了新的最先进水平。 AI

影响 为行人意图预测树立了新的最先进水平,有可能提高自动驾驶安全系统。

排序理由 该集群包含一篇详细介绍新研究方法和基准测试结果的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Danya Li, Xiang Su, Yan Feng, Rico Krueger ·

    利用视觉语言模型解码第一人称视角下的行人过马路意图

    arXiv:2606.09142v1 Announce Type: cross Abstract: Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions fro…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

    Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…

  3. arXiv cs.CV TIER_1 English(EN) · Rico Krueger ·

    利用视觉语言模型解码第一人称视角下的行人过马路意图

    Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…