PulseAugur
EN
LIVE 16:35:39

TAP-JEPA model achieves second place in action anticipation challenge

Researchers have developed TAP-JEPA, a novel action anticipation model that achieved second place in the EPIC-KITCHENS-100 challenge. This model leverages frozen V-JEPA 2.1 features, utilizing a ViT-G/384 encoder and a latent predictor to estimate future video tokens. These tokens are then fused with observed context using attentive probes to predict actions, specifically verbs, nouns, and verb-noun pairs. The submission achieved a Mean Top-5 Recall of 27.91%, narrowly missing the top spot by 0.04 percentage points. AI

IMPACT This research advances action anticipation capabilities, potentially improving egocentric video analysis and human-computer interaction.

RANK_REASON This is a research paper detailing a novel model and its performance on a specific benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Chaoyang Wang, Lexuan Xu ·

    TAP-JEPA: Frozen Future-Latent Probing and Two-Stage Score Fusion for EPIC-KITCHENS-100 Action Anticipation

    arXiv:2606.00662v1 Announce Type: new Abstract: This report presents TAP-JEPA, our runner-up submission to the EPIC-KITCHENS-100 (EK-100) Action Anticipation Challenge at EgoVis 2026. The task is to anticipate the next verb, noun, and verb-noun action from an egocentric clip that…