Vortex: Multi-Modal Fusion System for Intelligent Video Retrieval
The Vortex system, developed by the FocusOnFun team for the Ho Chi Minh City AI Challenge 2025, enhances intelligent video retrieval through multi-modal fusion. It integrates adaptive keyframe extraction, vision-language and speech model metadata generation, and a hybrid retrieval strategy combining CLIP and SigLIP2 embeddings. The system also features Rocchio-based relevance feedback and a multi-stage temporal search mechanism, built on Milvus and Elasticsearch for scalability. The FocusOnFun team achieved excellent performance in the competition, highlighting the effectiveness of their hybrid approach. AI
IMPACT This system advances intelligent multimedia search and temporal reasoning in video retrieval.