PulseAugur
EN
LIVE 08:30:11

New TEGU method uses text to localize unseen actions in videos

Researchers have developed a new method called TEGU for zero-shot temporal action localization in videos. This approach leverages textual information from large language models and captions to improve the fine-grained discrimination of actions, especially when labeled training data is scarce. TEGU aims to overcome limitations of existing Vision and Language Models in distinguishing subtle action differences. Experiments on THUMOS14 and ActivityNet-v1.3 datasets demonstrate that TEGU outperforms current state-of-the-art methods that do not rely on training data. AI

IMPACT Improves video understanding by enabling localization of unseen actions using textual guidance.

RANK_REASON The cluster contains an academic paper detailing a new method for video analysis.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 English(EN) · Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero, Paolo Rota, Yiming Wang, Elisa Ricci ·

    Zero-Shot Temporal Action Localization Through Textual Guidance

    arXiv:2605.22201v1 Announce Type: new Abstract: Zero-shot temporal action localization (ZS-TAL) consists of classifying and localizing actions in untrimmed videos, where action classes are unseen at training time. Existing work uses Vision and Language Models (VLMs), taking advan…

  2. arXiv cs.CV TIER_1 English(EN) · Elisa Ricci ·

    Zero-Shot Temporal Action Localization Through Textual Guidance

    Zero-shot temporal action localization (ZS-TAL) consists of classifying and localizing actions in untrimmed videos, where action classes are unseen at training time. Existing work uses Vision and Language Models (VLMs), taking advantage of their strong zero-shot transfer capabili…