Researchers have discovered that Vision-Language Models (VLMs) struggle to accurately infer human gaze direction, often mistaking head orientation for eye movement. In a study involving 1,360 real-world images, VLMs showed a significant performance gap compared to humans in identifying gaze targets. The primary reason identified was the models' reliance on head orientation cues rather than actual eye appearance, suggesting a data-driven bias that future work aims to address for more effective human-AI interaction. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights a current limitation in VLM's ability to interpret human non-verbal cues, potentially impacting human-AI interaction technologies.
RANK_REASON Academic paper detailing a specific limitation in current Vision-Language Models.