English(EN) Zero-Shot Temporal Action Localization Through Textual Guidance

新的TEGU方法使用文本定位视频中未见过的动作

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-21 09:05

研究人员开发了一种名为TEGU的新方法，用于视频中的零样本时序动作定位。该方法利用大型语言模型和字幕的文本信息，以提高动作的细粒度区分能力，尤其是在标记训练数据稀缺的情况下。TEGU旨在克服现有视觉和语言模型在区分细微动作差异方面的局限性。在THUMOS14和ActivityNet-v1.3数据集上的实验表明，TEGU的表现优于当前不依赖训练数据最先进的方法。 AI

影响通过利用文本引导实现对未见过的动作的定位，从而改进视频理解能力。

排序理由该集群包含一篇详细介绍视频分析新方法的学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero, Paolo Rota, Yiming Wang, Elisa Ricci · 2026-05-22 04:00

通过文本指导实现零样本时间动作定位

arXiv:2605.22201v1 Announce Type: new Abstract: Zero-shot temporal action localization (ZS-TAL) consists of classifying and localizing actions in untrimmed videos, where action classes are unseen at training time. Existing work uses Vision and Language Models (VLMs), taking advan…
arXiv cs.CV TIER_1 English(EN) · Elisa Ricci · 2026-05-21 09:05

通过文本指导实现零样本时间动作定位

Zero-shot temporal action localization (ZS-TAL) consists of classifying and localizing actions in untrimmed videos, where action classes are unseen at training time. Existing work uses Vision and Language Models (VLMs), taking advantage of their strong zero-shot transfer capabili…

报道来源 [2]

通过文本指导实现零样本时间动作定位

通过文本指导实现零样本时间动作定位

相关实体

相关话题