Researchers have developed new methods to evaluate and improve how vision-language models (VLMs) understand human gaze. One study introduces EyeVLM, a framework to benchmark VLMs on gaze following and social gaze prediction, finding current models lack precise understanding. A separate paper proposes a novel training mechanism using local LoRA and an out-of-cone penalty to enhance gaze reasoning in vision foundation models for gaze following tasks, achieving state-of-the-art results. AI
IMPACT New benchmarks and training techniques could lead to more sophisticated AI systems capable of understanding human attention and social cues.
RANK_REASON The cluster contains two academic papers detailing new benchmarks and methods for evaluating and improving vision-language models' understanding of human gaze.
Read on Hugging Face Daily Papers →
- EyeVLM
- Social gaze prediction
- Vision-Language Models
- GazeFollow
- Gaze reasoning
- vision foundation models
- Hengfei Wang
AI-generated summary · Google Gemini · from 5 sources. How we write summaries →