PulseAugur
实时 22:22:48

新研究对VLM的注视理解进行基准测试并加以改进

研究人员开发了新的方法来评估和改进视觉语言模型(VLMs)对人类注视的理解。一项研究引入了EyeVLM,一个用于对VLMs进行注视跟随和社会注视预测基准测试的框架,发现当前模型缺乏精确的理解。另一篇论文提出了一种新颖的训练机制,使用局部LoRA和视锥外惩罚来增强视觉基础模型在注视跟随任务中的注视推理能力,取得了最先进的结果。 AI

影响 新的基准测试和训练技术可能带来更复杂的AI系统,能够理解人类的注意力和社会线索。

排序理由 该集群包含两篇学术论文,详细介绍了用于评估和改进视觉语言模型对人类注视理解的新基准测试和方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

新研究对VLM的注视理解进行基准测试并加以改进

报道来源 [5]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    增强视觉基础模型中的注视推理以实现注视跟随

    Gaze following requires both scene understanding and gaze reasoning to localize the gaze target of an in-scene person. Recently, vision foundation models (VFMs) have demonstrated strong performance on this task, enabling simpler architectures while outperforming prior methods. Ho…

  2. arXiv cs.CV TIER_1 English(EN) · Hengfei Wang, Anshul Gupta, Pierre Vuillecard, Jean-Marc Odobez ·

    聚焦VLM:视觉语言模型中的注视跟随与社交注视预测基准测试

    arXiv:2605.19859v2 Announce Type: replace Abstract: Vision-language models (VLMs) have rapidly evolved into general-purpose multimodal reasoners with strong zero-shot generalization. In this context, VLMs could greatly benefit the analysis of human gaze and attention, a central t…

  3. arXiv cs.CV TIER_1 English(EN) · Shijing Wang, Yaping Huang, Chaoqun Cui, David Wong, Yihua Cheng, Alexandros Neophytou, Hyung Jin Chang ·

    增强视觉基础模型中的注视推理以实现注视跟随

    arXiv:2605.22607v1 Announce Type: new Abstract: Gaze following requires both scene understanding and gaze reasoning to localize the gaze target of an in-scene person. Recently, vision foundation models (VFMs) have demonstrated strong performance on this task, enabling simpler arc…

  4. arXiv cs.CV TIER_1 English(EN) · Hyung Jin Chang ·

    增强视觉基础模型中的注视推理以实现注视跟随

    Gaze following requires both scene understanding and gaze reasoning to localize the gaze target of an in-scene person. Recently, vision foundation models (VFMs) have demonstrated strong performance on this task, enabling simpler architectures while outperforming prior methods. Ho…

  5. arXiv cs.CV TIER_1 English(EN) · Jean-Marc Odobez ·

    聚焦VLM:视觉语言模型中的注视跟随与社交注视预测基准测试

    Vision-language models (VLMs) have rapidly evolved into general-purpose multimodal reasoners with strong zero-shot generalization. In this context, VLMs could greatly benefit the analysis of human gaze and attention, a central task in human behavior understanding that requires re…