AI models tackle zero-shot video retrieval with reasoning

By PulseAugur Editorial · [4 sources] · 2026-06-01 14:35

Researchers have developed new frameworks for zero-shot composed video retrieval, a task that involves finding a target video based on a reference video and a textual modification instruction. These methods, presented at the CVPR 2026 VidLLMs workshop, utilize frozen foundation models to reason about the implied changes and re-rank potential candidates. One approach, R3-CoVR, achieved high accuracy by using a multimodal LLM to generate post-edit descriptions and a constraint-aware re-ranker, while another, R^3, focuses on reasoning-guided recalling and re-ranking. AI

IMPACT Introduces new methods for video retrieval that leverage LLMs for reasoning, potentially improving search accuracy and flexibility.

RANK_REASON Multiple research papers presenting novel frameworks for a specific AI task.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

AI models tackle zero-shot video retrieval with reasoning

COVERAGE [4]

arXiv cs.LG TIER_1 English(EN) · Ali Alavi · 2026-06-02 04:00

Reason, Retrieve, Re-rank: A Zero-Shot Reasoning-Aware Framework for Composed Video Retrieval

arXiv:2606.00910v1 Announce Type: cross Abstract: Composed Video Retrieval (CoVR) seeks the target video that results from applying a free-form textual modification to a reference video. We address the \emph{Reason-Aware} CoVR (CoVR-R) challenge at the CVPR~2026 VidLLMs workshop,…
arXiv cs.CV TIER_1 English(EN) · Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Weili Guan, Liqiang Nie · 2026-06-02 04:00

R^3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking

arXiv:2606.01113v1 Announce Type: new Abstract: The CoVR-R challenge evaluates composed video retrieval, where a system must retrieve a target video from a large gallery given a reference video and a textual edit instruction. This setting is not a standard video-text retrieval pr…
arXiv cs.CV TIER_1 English(EN) · Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang · 2026-06-02 04:00

Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

arXiv:2606.02321v1 Announce Type: new Abstract: Recent advances in large vision-language models have expanded video retrieval from simple text-based search to more flexible scenarios, where users may specify the desired result through both visual examples and textual instructions…
arXiv cs.CV TIER_1 English(EN) · Qingming Huang · 2026-06-01 14:35

Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

Recent advances in large vision-language models have expanded video retrieval from simple text-based search to more flexible scenarios, where users may specify the desired result through both visual examples and textual instructions. In the CVPR 2026 Reason-Aware Composed Video R…

COVERAGE [4]

Reason, Retrieve, Re-rank: A Zero-Shot Reasoning-Aware Framework for Composed Video Retrieval

R^3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking

Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

RELATED ENTITIES

RELATED TOPICS