New MSUE system achieves 0.95 accuracy in SoccerNet VQA Challenge

By PulseAugur Editorial · [1 sources] · 2026-06-10 14:00

Researchers have developed MSUE, a multi-expert system designed for the 2026 SoccerNet VQA Challenge. This system utilizes a Vision-Language Model to synthesize training data and employs a Large Language Model to route questions to specialized text, image, or video experts. By integrating Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, MSUE achieved a 0.95 accuracy on the challenge benchmark, securing third place. AI

IMPACT Demonstrates a novel multi-expert architecture for multimodal understanding, potentially influencing future VQA systems.

RANK_REASON The cluster contains an academic paper detailing a new model architecture and its performance on a specific benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Yixi Zhou · 2026-06-10 14:00

MSUE: Multi-Modal Soccer Understanding Expert

This paper presents our solution to the 2026 SoccerNet VQA Challenge. We first develop a cost-effective data synthesis pipeline driven by a Vision-Language Model (VLM), which systematically restructures raw domain data into diverse VQA samples, including concise answers and long-…

COVERAGE [1]

MSUE: Multi-Modal Soccer Understanding Expert

RELATED ENTITIES

RELATED TOPICS