MSUE: Multi-Modal Soccer Understanding Expert
Researchers have developed MSUE, a multi-expert system designed for the 2026 SoccerNet VQA Challenge. This system utilizes a Vision-Language Model to synthesize training data and employs a Large Language Model to route questions to specialized text, image, or video experts. By integrating Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, MSUE achieved a 0.95 accuracy on the challenge benchmark, securing third place. AI
IMPACT Demonstrates a novel multi-expert architecture for multimodal understanding, potentially influencing future VQA systems.