Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 17h

MSUE: Multi-Modal Soccer Understanding Expert

Researchers have developed MSUE, a multi-expert system designed for the 2026 SoccerNet VQA Challenge. This system utilizes a Vision-Language Model to synthesize training data and employs a Large Language Model to route questions to specialized text, image, or video experts. By integrating Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, MSUE achieved a 0.95 accuracy on the challenge benchmark, securing third place. AI

IMPACT Demonstrates a novel multi-expert architecture for multimodal understanding, potentially influencing future VQA systems.

Qwen3-VL
Vision-Language Model
Large Language Model
SoccerNet VQA Challenge
Gemini3-Flash