Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [5 sources]

Enhancing Gaze Reasoning in Vision Foundation Models for Gaze Following

Researchers have developed new methods to evaluate and improve how vision-language models (VLMs) understand human gaze. One study introduces EyeVLM, a framework to benchmark VLMs on gaze following and social gaze prediction, finding current models lack precise understanding. A separate paper proposes a novel training mechanism using local LoRA and an out-of-cone penalty to enhance gaze reasoning in vision foundation models for gaze following tasks, achieving state-of-the-art results. AI

IMPACT New benchmarks and training techniques could lead to more sophisticated AI systems capable of understanding human attention and social cues.

Vision-Language Models
Social gaze prediction
EyeVLM
GazeFollow
Gaze reasoning
vision foundation models
Hengfei Wang