English(EN) One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition

新AI模型通过关联跨角色和外观的实体来增强视频理解

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-28 04:00

研究人员开发了一种名为多模态实体共指（MEC）的新方法，以提高视频态势识别能力。该方法将实体的文本描述与其在视频中不同场景和外观下的视觉表示联系起来。通过统一事件角色提及与视觉实体聚类，MEC提高了视频字幕的准确性和实体在视频帧内的基础性。 AI

影响通过提高跨视觉和文本模态的实体一致性来增强视频理解。

排序理由介绍视频态势识别新方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Balaji Darur, Amanmeet Garg, Makarand Tapaswi · 2026-04-28 04:00

One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition

arXiv:2604.23173v1 Announce Type: new Abstract: Video Situation Recognition (VidSitu) addresses the challenging problem of "who did what to whom, with what, how, and where" in a video. It tests thorough video understanding by requiring identification of salient actions and associ…