ConTrans model advances zero-shot video action localization

By PulseAugur Editorial · [1 sources] · 2026-06-01 04:00

Researchers have developed a new model called ConTrans to improve zero-shot temporal action localization in videos. This model integrates convolutional layers with transformer self-attention to better capture both local frame correlations and long-range global context. ConTrans establishes a new benchmark on the ActivityNet-1.3 and THUMOS14 datasets, outperforming existing methods in detecting unseen actions. AI

IMPACT Establishes a new benchmark for zero-shot temporal action localization, potentially improving video analysis capabilities.

RANK_REASON This is a research paper detailing a new model and its performance on academic benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Kanchan Keisham, Thenukan Pathmanathan, Thangarajah Akilan · 2026-06-01 04:00

ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization

arXiv:2605.30689v1 Announce Type: cross Abstract: Zero-shot Temporal Action Localization (ZS-TAL) aims to detect and locate previously unseen actions in untrimmed videos. However, existing approaches primarily focus on modeling long-range contextual information, often neglecting …

COVERAGE [1]

ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization

RELATED ENTITIES

RELATED TOPICS