SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain
Researchers have introduced SVFSearch, a new benchmark designed to evaluate multimodal large language models in short-video frame search, specifically within the Chinese gaming domain. The benchmark includes 5,000 test examples and 4,198 training examples, featuring paused game scenes from real short-video clips. SVFSearch provides a controlled environment with a game-domain corpus and image gallery to ensure reproducible evaluations, revealing significant gaps between model performance and oracle knowledge, and highlighting issues in visual grounding and retrieval. AI
IMPACT This benchmark aims to improve multimodal LLM capabilities in understanding and retrieving information from short videos, particularly in specialized domains like gaming.