MLLMs adapted for nuanced video retrieval, achieving SOTA performance

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-27 04:00

Researchers have developed a novel method for video retrieval that enhances understanding of nuanced queries. This approach adapts Multimodal Large Language Models (MLLMs) to better interpret temporal actions, negations, and multimodal compositions. By fine-tuning the MLLM with a contrastive loss and carefully selected hard negatives, the model achieves state-of-the-art performance on nuanced video retrieval benchmarks, even with text-only training. AI

影响 Improves video search capabilities by enabling more precise retrieval based on complex textual queries.

排序理由 This is a research paper detailing a new method for video retrieval using MLLMs.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Piyush Bagad, Andrew Zisserman · 2026-04-27 04:00

Adapting MLLMs for Nuanced Video Retrieval

arXiv:2512.13511v2 Announce Type: replace Abstract: Our objective is to build an embedding model that captures the nuanced relationship between a search query and candidate videos. We cover three aspects of nuanced retrieval: (i) temporal, (ii) negation, and (iii) multimodal. For…

报道来源 [1]

Adapting MLLMs for Nuanced Video Retrieval

相关实体

相关话题