PulseAugur
实时 08:46:46

BabyMind 使用物体优先偏置将语言与儿童视角视频关联

研究人员开发了 BabyMind,一种将语言与儿童视角视频数据关联的新颖方法。该方法通过采用物体优先归纳偏置来解决稀疏和嘈杂监督的挑战。BabyMind 提取物体嵌入,使用跟踪将它们链接到物体文件中,并通过对比学习目标将它们与话语对齐。该系统在 SAYCam-S 等基准测试中表现出更高的准确性,优于先前的方法。 AI

影响 引入了一种改进视频中语言关联的新方法,可能增强 AI 对视觉上下文的理解。

排序理由 这是一篇详细介绍视频中语言关联新方法的学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Sathira Silva, Abrham Kahsay Gebreselasie, Muhammad Umer Sheikh, Kartik Kuckreja, Daniel Harari, Muhammad Haris Khan ·

    Objects Before Words: Object-First Inductive Biases for Grounding Language in Child-View Video

    arXiv:2606.12985v1 Announce Type: new Abstract: Learning grounded word meaning from natural experience requires resolving two ambiguities in infant-view recordings: when the named referent appears and where it is in a cluttered frame. In SAYCam-style data, caregiver speech is spa…

  2. arXiv cs.CV TIER_1 English(EN) · Muhammad Haris Khan ·

    Objects Before Words: Object-First Inductive Biases for Grounding Language in Child-View Video

    Learning grounded word meaning from natural experience requires resolving two ambiguities in infant-view recordings: when the named referent appears and where it is in a cluttered frame. In SAYCam-style data, caregiver speech is sparse and weakly synchronized with egocentric vide…