Researchers have developed a novel method for video retrieval that enhances understanding of nuanced queries. This approach adapts Multimodal Large Language Models (MLLMs) to better interpret temporal actions, negations, and multimodal compositions. By fine-tuning the MLLM with a contrastive loss and carefully selected hard negatives, the model achieves state-of-the-art performance on nuanced video retrieval benchmarks, even with text-only training. AI
影响 Improves video search capabilities by enabling more precise retrieval based on complex textual queries.
排序理由 This is a research paper detailing a new method for video retrieval using MLLMs.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →