PulseAugur
实时 05:39:55

MLLMs adapted for nuanced video retrieval, achieving SOTA performance

Researchers have developed a novel method for video retrieval that enhances understanding of nuanced queries. This approach adapts Multimodal Large Language Models (MLLMs) to better interpret temporal actions, negations, and multimodal compositions. By fine-tuning the MLLM with a contrastive loss and carefully selected hard negatives, the model achieves state-of-the-art performance on nuanced video retrieval benchmarks, even with text-only training. AI

影响 Improves video search capabilities by enabling more precise retrieval based on complex textual queries.

排序理由 This is a research paper detailing a new method for video retrieval using MLLMs.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

MLLMs adapted for nuanced video retrieval, achieving SOTA performance

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Piyush Bagad, Andrew Zisserman ·

    Adapting MLLMs for Nuanced Video Retrieval

    arXiv:2512.13511v2 Announce Type: replace Abstract: Our objective is to build an embedding model that captures the nuanced relationship between a search query and candidate videos. We cover three aspects of nuanced retrieval: (i) temporal, (ii) negation, and (iii) multimodal. For…