Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 21h · [3 sources]

MSUE: Multi-Modal Soccer Understanding Expert

Researchers have developed MSUE, a multi-expert system designed for understanding soccer-related questions using multi-modal data. The system leverages a Vision-Language Model for data synthesis and a Large Language Model to route queries to specialized text, image, and video experts. By integrating Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, MSUE achieved a 0.95 accuracy on the 2026 SoccerNet VQA Challenge, securing third place. AI

IMPACT Demonstrates advanced multi-modal reasoning for sports analytics, potentially improving automated commentary and fan engagement tools.

Qwen3-VL
Vision-Language Model
Large Language Model
SoccerNet VQA Challenge
Gemini3-Flash
2026 SoccerNet VQA Challenge