STAR framework boosts few-shot action recognition with LLM-guided temporal learning

By PulseAugur Editorial · [1 sources] · 2026-05-13 08:54

Researchers have developed a new framework called STAR (Semantic-Temporal Adaptive Representation Learning) to improve few-shot action recognition in videos. This approach addresses issues of semantic-temporal misalignment and inadequate modeling of temporal dynamics by integrating a Temporal Semantic Attention mechanism for fine-grained consistency and a Semantic Temporal Prototype Refiner that leverages Mamba blocks. The framework also utilizes temporally dependent class descriptors from large language models to provide long-range semantic guidance, demonstrating significant gains on multiple benchmarks. AI

IMPACT Enhances video understanding capabilities, potentially improving applications in surveillance, robotics, and content analysis.

RANK_REASON Academic paper detailing a new framework for few-shot action recognition. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Shengjie Zhao · 2026-05-13 08:54

STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition

Few-shot action recognition (FSAR) requires models to generalize to novel action categories from only a handful of annotated samples. Despite progress with vision-language models, existing approaches still suffer from semantic-temporal misalignment, where static textual prompts f…

COVERAGE [1]

STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition

RELATED ENTITIES

RELATED TOPICS