English(EN) EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

新的基准和合成数据旨在提升AI的自我中心视频理解能力

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-18 10:58

研究人员引入了新的基准和合成数据生成方法，以提高大型多模态模型（LMMs）在自我中心视频数据上的性能。EgoBabyVLM基准侧重于从自然、弱对齐的自我中心视频中进行语言接地，突显了当前LMMs在该领域的局限性。类似地，EgoExoMem使用同步的自我中心和外中心视频来解决跨视图记忆推理问题，表明现有模型难以达到高精度。为了克服数据收集的挑战，EgoInteract提供了一个可控的模拟器，用于生成具有密集注释的合成自我中心视频，并在真实世界基准上展示了改进的模型性能。 AI

影响自我中心视频理解的进步可以支持更复杂的具身AI代理和人机交互系统。

排序理由多篇研究论文为自我中心视频理解引入了新的基准和合成数据生成方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.CL TIER_1 English(EN) · Emmanuel Dupoux · 2026-05-18 21:30

EgoBabyVLM：基于自然主义自我中心视频数据的跨模态学习基准测试

Children acquire language grounding with remarkable robustness from limited visuo-linguistic input in ways that surpass today's best large multimodal models. Recent research suggests current vision-language models (VLMs) trained on curated web data fail to generalize to the spars…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-18 17:54

EgoExoMem：跨视图同步的自我中心和外中心视频的记忆推理

Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchmark for cross-view memory reasoning over…
arXiv cs.CV TIER_1 English(EN) · Rainer Stiefelhagen · 2026-05-18 17:54

EgoExoMem：跨视图同步自我中心与外中心视频的记忆推理

Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchmark for cross-view memory reasoning over…
arXiv cs.CV TIER_1 English(EN) · Giovanni Maria Farinella · 2026-05-18 10:58

EgoInteract：用于交互理解和预测的合成自我中心视频生成

Collecting large-scale egocentric video datasets with dense spatial and temporal annotations is costly, slow, and often constrained by environmental biases, privacy constraints, and limited coverage of interaction patterns. While synthetic data has shown strong potential in sever…

报道来源 [4]

EgoBabyVLM：基于自然主义自我中心视频数据的跨模态学习基准测试

EgoExoMem：跨视图同步的自我中心和外中心视频的记忆推理

EgoExoMem：跨视图同步自我中心与外中心视频的记忆推理

EgoInteract：用于交互理解和预测的合成自我中心视频生成

相关实体

相关话题