Brief · PulseAugur

TOOL · arXiv cs.CV Dansk(DA) · 11h

Vortex: Multi-Modal Fusion System for Intelligent Video Retrieval

The Vortex system, developed by the FocusOnFun team for the Ho Chi Minh City AI Challenge 2025, enhances intelligent video retrieval through multi-modal fusion. It integrates adaptive keyframe extraction, vision-language and speech model metadata generation, and a hybrid retrieval strategy combining CLIP and SigLIP2 embeddings. The system also features Rocchio-based relevance feedback and a multi-stage temporal search mechanism, built on Milvus and Elasticsearch for scalability. The FocusOnFun team achieved excellent performance in the competition, highlighting the effectiveness of their hybrid approach. AI

IMPACT This system advances intelligent multimedia search and temporal reasoning in video retrieval.

Hugging Face
Elasticsearch
Milvus
Reciprocal Rank Fusion
SigLIP2
Vortex
FocusOnFun
Ho Chi Minh City AI Challenge 2025
Rocchio