Researchers have developed new methods to evaluate and improve how vision-language models (VLMs) understand human gaze. One study introduces EyeVLM, a framework to benchmark VLMs on gaze following and social gaze prediction, finding current models lack precise understanding. A separate paper proposes a novel training mechanism using local LoRA and an out-of-cone penalty to enhance gaze reasoning in vision foundation models for gaze following tasks, achieving state-of-the-art results. AI
影响 New benchmarks and training techniques could lead to more sophisticated AI systems capable of understanding human attention and social cues.
排序理由 The cluster contains two academic papers detailing new benchmarks and methods for evaluating and improving vision-language models' understanding of human gaze.
在 Hugging Face Daily Papers 阅读 →
- EyeVLM
- Social gaze prediction
- Vision-Language Models
- GazeFollow
- Gaze reasoning
- vision foundation models
- Hengfei Wang
AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →