English(EN) Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

VLMs 从自我中心视频预测行人意图

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-08 07:39

研究人员开发了一种使用自我中心视觉和视觉语言模型（VLMs）预测行人过马路意图的新方法。通过将任务构建为视觉问答，他们对 VLMs 进行了微调，使其性能显著优于现有的基于 Transformer 的模型。包含的眼动和自我运动等上下文线索进一步提高了预测准确性，为这一安全关键应用树立了新的最先进水平。 AI

影响为行人意图预测树立了新的最先进水平，有可能提高自动驾驶安全系统。

排序理由该集群包含一篇详细介绍新研究方法和基准测试结果的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Danya Li, Xiang Su, Yan Feng, Rico Krueger · 2026-06-09 04:00

利用视觉语言模型解码第一人称视角下的行人过马路意图

arXiv:2606.09142v1 Announce Type: cross Abstract: Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions fro…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 07:39

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…
arXiv cs.CV TIER_1 English(EN) · Rico Krueger · 2026-06-08 07:39

利用视觉语言模型解码第一人称视角下的行人过马路意图

Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…

报道来源 [3]

利用视觉语言模型解码第一人称视角下的行人过马路意图

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

利用视觉语言模型解码第一人称视角下的行人过马路意图

相关实体

相关话题