English(EN) From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

研究人员调整CLIP以实现高效的视频理解和行人重识别

作者 PulseAugur 编辑部 · [5 个来源] · 2026-04-24 03:37

研究人员开发了SAGA-ReID，通过重新思考CLIP特征的聚合方式来改进行人重识别。这种新方法将中间的patch token与CLIP文本嵌入空间中的anchor vector对齐，有助于强调稳定的身份证据并抑制损坏或缺失的区域，尤其是在遮挡情况下。实验表明，SAGA-ReID的性能显著优于全局池化方法，在遮挡基准测试中Rank-1提升高达+10.6。此外，EV-CLIP提供了一个高效的框架用于少样本视频动作识别，通过使用掩码和上下文提示进行注意力引导和时间建模，解决了低光照条件和以自我为中心的视角等挑战。 AI

影响这些论文介绍了提高计算机视觉任务（如行人重识别和动作识别）鲁棒性和效率的方法，有可能在具有挑战性的现实条件下实现更好的性能。

排序理由 arXiv上发表了两篇新研究论文，详细介绍了改编和改进现有模型（如CLIP）以用于特定计算机视觉任务的新方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.CV TIER_1 English(EN) · Darshan Singh S, Zeeshan Khan, Makarand Tapaswi · 2026-04-28 04:00

SRL-CLIP: Efficient CLIP Video Adaptation via Structured Semantic Role Labels

arXiv:2401.07669v2 Announce Type: replace Abstract: Adapting CLIP for videos has gained popularity due to its semantic and rich representation. While CLIP is a good starting point, it typically undergoes post-pretraining (contrastive finetuning) on large video narration or captio…
arXiv cs.CV TIER_1 English(EN) · Aotian Zheng, Winston Sun, Bahaa Alattar, Vitaly Ablavsky, Jenq-Neng Hwang · 2026-04-27 04:00

From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

arXiv:2604.22190v1 Announce Type: new Abstract: CLIP-based person re-identification (ReID) methods aggregate spatial features into a single global \texttt{[CLS]} token optimized for image-text alignment rather than spatial selectivity, making representations fragile under occlusi…
arXiv cs.CV TIER_1 English(EN) · Hyo Jin Jon, Longbin Jin, Eun Yi Kim · 2026-04-27 04:00

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

arXiv:2604.22595v1 Announce Type: new Abstract: CLIP has demonstrated strong generalization in visual domains through natural language supervision, even for video action recognition. However, most existing approaches that adapt CLIP for action recognition have primarily focused o…
arXiv cs.CV TIER_1 English(EN) · Eun Yi Kim · 2026-04-24 14:23

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

CLIP has demonstrated strong generalization in visual domains through natural language supervision, even for video action recognition. However, most existing approaches that adapt CLIP for action recognition have primarily focused on temporal modeling, often overlooking spatial p…
arXiv cs.CV TIER_1 English(EN) · Jenq-Neng Hwang · 2026-04-24 03:37

From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

CLIP-based person re-identification (ReID) methods aggregate spatial features into a single global \texttt{[CLS]} token optimized for image-text alignment rather than spatial selectivity, making representations fragile under occlusion and cross-camera variation. We propose SAGA-R…

报道来源 [5]

SRL-CLIP: Efficient CLIP Video Adaptation via Structured Semantic Role Labels

From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

相关实体

相关话题