TAP-JEPA model achieves second place in action anticipation challenge

By PulseAugur Editorial · [1 sources] · 2026-06-02 04:00

Researchers have developed TAP-JEPA, a novel action anticipation model that achieved second place in the EPIC-KITCHENS-100 challenge. This model leverages frozen V-JEPA 2.1 features, utilizing a ViT-G/384 encoder and a latent predictor to estimate future video tokens. These tokens are then fused with observed context using attentive probes to predict actions, specifically verbs, nouns, and verb-noun pairs. The submission achieved a Mean Top-5 Recall of 27.91%, narrowly missing the top spot by 0.04 percentage points. AI

IMPACT This research advances action anticipation capabilities, potentially improving egocentric video analysis and human-computer interaction.

RANK_REASON This is a research paper detailing a novel model and its performance on a specific benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

TAP-JEPA model achieves second place in action anticipation challenge

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Chaoyang Wang, Lexuan Xu · 2026-06-02 04:00

TAP-JEPA: Frozen Future-Latent Probing and Two-Stage Score Fusion for EPIC-KITCHENS-100 Action Anticipation

arXiv:2606.00662v1 Announce Type: new Abstract: This report presents TAP-JEPA, our runner-up submission to the EPIC-KITCHENS-100 (EK-100) Action Anticipation Challenge at EgoVis 2026. The task is to anticipate the next verb, noun, and verb-noun action from an egocentric clip that…

COVERAGE [1]

TAP-JEPA: Frozen Future-Latent Probing and Two-Stage Score Fusion for EPIC-KITCHENS-100 Action Anticipation

RELATED TOPICS