PulseAugur
实时 19:49:40

New research benchmarks and enhances VLM gaze understanding

Researchers have developed new methods to evaluate and improve how vision-language models (VLMs) understand human gaze. One study introduces EyeVLM, a framework to benchmark VLMs on gaze following and social gaze prediction, finding current models lack precise understanding. A separate paper proposes a novel training mechanism using local LoRA and an out-of-cone penalty to enhance gaze reasoning in vision foundation models for gaze following tasks, achieving state-of-the-art results. AI

影响 New benchmarks and training techniques could lead to more sophisticated AI systems capable of understanding human attention and social cues.

排序理由 The cluster contains two academic papers detailing new benchmarks and methods for evaluating and improving vision-language models' understanding of human gaze.

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

New research benchmarks and enhances VLM gaze understanding

报道来源 [5]

  1. Hugging Face Daily Papers TIER_1 ·

    Enhancing Gaze Reasoning in Vision Foundation Models for Gaze Following

    Gaze following requires both scene understanding and gaze reasoning to localize the gaze target of an in-scene person. Recently, vision foundation models (VFMs) have demonstrated strong performance on this task, enabling simpler architectures while outperforming prior methods. Ho…

  2. arXiv cs.CV TIER_1 · Hengfei Wang, Anshul Gupta, Pierre Vuillecard, Jean-Marc Odobez ·

    Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

    arXiv:2605.19859v2 Announce Type: replace Abstract: Vision-language models (VLMs) have rapidly evolved into general-purpose multimodal reasoners with strong zero-shot generalization. In this context, VLMs could greatly benefit the analysis of human gaze and attention, a central t…

  3. arXiv cs.CV TIER_1 · Shijing Wang, Yaping Huang, Chaoqun Cui, David Wong, Yihua Cheng, Alexandros Neophytou, Hyung Jin Chang ·

    Enhancing Gaze Reasoning in Vision Foundation Models for Gaze Following

    arXiv:2605.22607v1 Announce Type: new Abstract: Gaze following requires both scene understanding and gaze reasoning to localize the gaze target of an in-scene person. Recently, vision foundation models (VFMs) have demonstrated strong performance on this task, enabling simpler arc…

  4. arXiv cs.CV TIER_1 · Hyung Jin Chang ·

    Enhancing Gaze Reasoning in Vision Foundation Models for Gaze Following

    Gaze following requires both scene understanding and gaze reasoning to localize the gaze target of an in-scene person. Recently, vision foundation models (VFMs) have demonstrated strong performance on this task, enabling simpler architectures while outperforming prior methods. Ho…

  5. arXiv cs.CV TIER_1 · Jean-Marc Odobez ·

    Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

    Vision-language models (VLMs) have rapidly evolved into general-purpose multimodal reasoners with strong zero-shot generalization. In this context, VLMs could greatly benefit the analysis of human gaze and attention, a central task in human behavior understanding that requires re…