English(EN) Comparing Human Gaze and Vision-Language Model Attention in Safety-Relevant Environments

AI模型在安全关键场景中表现出类似人类的注意力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

一项新近发表在arXiv上的研究，比较了大型视觉语言模型（VLMs）的视觉注意力和人类在安全关键环境中的注视模式。研究人员收集了参与者观看危险场景时的眼动追踪数据，然后提示GPT-4o、Gemini Pro、Gemini Flash和Claude等模型预测人类注意力。研究结果表明，VLMs能够识别出与人类视觉焦点大致一致的兴趣区域，这表明它们有潜力作为可扩展的工具，在无需显式眼动追踪训练的情况下近似人类的注意力模式。 AI

影响表明VLMs可以近似人类的注意力模式，可能有助于安全分析和设计。

排序理由该集群包含一篇学术论文，详细介绍了AI模型注意力与人类注视的比较研究。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Marta Vallejo, Siwen Wang · 2026-06-16 04:00

Comparing Human Gaze and Vision-Language Model Attention in Safety-Relevant Environments

arXiv:2606.15202v1 Announce Type: new Abstract: Human visual attention plays an important role in how people perceive and respond to environments containing potential risks. This study investigates whether large vision-language models can identify the same regions of a scene that…

报道来源 [1]

Comparing Human Gaze and Vision-Language Model Attention in Safety-Relevant Environments

相关实体

相关话题