PulseAugur
EN
LIVE 09:20:26

VideoSearch-R1 framework refines video search with latent space queries

Researchers have introduced VideoSearch-R1, a novel agentic framework designed to improve video retrieval and reasoning. This system iteratively interacts with a video search engine, employing a technique called Soft Query Refinement (SQR) to adjust search queries in a continuous latent space. The framework is trained using Group Relative Policy Optimization (GRPO) and has demonstrated state-of-the-art performance on Video Corpus Moment Retrieval (VCMR) benchmarks, requiring fewer generated tokens than traditional text-based query refinement. AI

IMPACT This research could lead to more efficient and accurate video search and analysis systems by improving how queries are refined and processed.

RANK_REASON The cluster describes a new research paper detailing a novel framework and technique for video retrieval and reasoning.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

VideoSearch-R1 framework refines video search with latent space queries

COVERAGE [3]

  1. arXiv cs.AI TIER_1 English(EN) · Seohyun Lee, Seoung Choi, Dohwan Ko, Jongha Kim, Hyunwoo J. Kim ·

    VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement

    arXiv:2607.00446v1 Announce Type: cross Abstract: As video corpora continue to expand in both scale and task complexity, there is increasing demand for approaches that retrieve relevant videos from large-scale corpora (inter-video reasoning) and subsequently perform fine-grained,…

  2. arXiv cs.AI TIER_1 English(EN) · Hyunwoo J. Kim ·

    VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement

    As video corpora continue to expand in both scale and task complexity, there is increasing demand for approaches that retrieve relevant videos from large-scale corpora (inter-video reasoning) and subsequently perform fine-grained, query-conditioned tasks (intra-video reasoning) w…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement

    VideoSearch-R1 is an agentic framework that iteratively retrieves videos and refines search queries using continuous latent space refinement and policy optimization for improved video moment retrieval and temporal grounding.