Vision-language models mistake head orientation for gaze direction

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have discovered that Vision-Language Models (VLMs) struggle to accurately infer human gaze direction, often mistaking head orientation for eye movement. In a study involving 1,360 real-world images, VLMs showed a significant performance gap compared to humans in identifying gaze targets. The primary reason identified was the models' reliance on head orientation cues rather than actual eye appearance, suggesting a data-driven bias that future work aims to address for more effective human-AI interaction. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Highlights a current limitation in VLM's ability to interpret human non-verbal cues, potentially impacting human-AI interaction technologies.

RANK_REASON Academic paper detailing a specific limitation in current Vision-Language Models.

Read on arXiv cs.CL →

paper
safety

COVERAGE [1]

arXiv cs.CL TIER_1 · Zory Zhang, Pinyuan Feng, Bingyang Wang, Tianwei Zhao, Suyang Yu, Qingying Gao, Hokin Deng, Ziqiao Ma, Yijiang Li, Dezhi Luo · 2026-05-01 04:00

Vision-Language Models Mistake Head Orientation for Gaze Direction: Nonverbal Conversation Cues

arXiv:2506.05412v3 Announce Type: replace-cross Abstract: Where someone looks is a nonverbal communication cue that children and adults readily use. How well can Vision-Language Models (VLMs) infer gaze targets? To construct evaluation stimuli, we captured 1,360 real-world photos…

COVERAGE [1]

Vision-Language Models Mistake Head Orientation for Gaze Direction: Nonverbal Conversation Cues

RELATED ENTITIES

RELATED TOPICS