PulseAugur
LIVE 14:45:47
research · [1 source] ·
0
research

Vision-language models mistake head orientation for gaze direction

Researchers have discovered that Vision-Language Models (VLMs) struggle to accurately infer human gaze direction, often mistaking head orientation for eye movement. In a study involving 1,360 real-world images, VLMs showed a significant performance gap compared to humans in identifying gaze targets. The primary reason identified was the models' reliance on head orientation cues rather than actual eye appearance, suggesting a data-driven bias that future work aims to address for more effective human-AI interaction. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Highlights a current limitation in VLM's ability to interpret human non-verbal cues, potentially impacting human-AI interaction technologies.

RANK_REASON Academic paper detailing a specific limitation in current Vision-Language Models.

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Zory Zhang, Pinyuan Feng, Bingyang Wang, Tianwei Zhao, Suyang Yu, Qingying Gao, Hokin Deng, Ziqiao Ma, Yijiang Li, Dezhi Luo ·

    Vision-Language Models Mistake Head Orientation for Gaze Direction: Nonverbal Conversation Cues

    arXiv:2506.05412v3 Announce Type: replace-cross Abstract: Where someone looks is a nonverbal communication cue that children and adults readily use. How well can Vision-Language Models (VLMs) infer gaze targets? To construct evaluation stimuli, we captured 1,360 real-world photos…