MSUE: Multi-Modal Soccer Understanding Expert
Researchers have developed MSUE, a multi-expert system designed for understanding soccer-related questions using multi-modal data. The system leverages a Vision-Language Model for data synthesis and a Large Language Model to route queries to specialized text, image, and video experts. By integrating Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, MSUE achieved a 0.95 accuracy on the 2026 SoccerNet VQA Challenge, securing third place. AI
IMPACT Demonstrates advanced multi-modal reasoning for sports analytics, potentially improving automated commentary and fan engagement tools.