PulseAugur
EN
LIVE 04:32:54

New MSUE system achieves 0.95 accuracy in SoccerNet VQA Challenge

Researchers have developed MSUE, a multi-expert system designed for the 2026 SoccerNet VQA Challenge. This system utilizes a Vision-Language Model to synthesize training data and employs a Large Language Model to route questions to specialized text, image, or video experts. By integrating Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, MSUE achieved a 0.95 accuracy on the challenge benchmark, securing third place. AI

IMPACT Demonstrates a novel multi-expert architecture for multimodal understanding, potentially influencing future VQA systems.

RANK_REASON The cluster contains an academic paper detailing a new model architecture and its performance on a specific benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Yixi Zhou ·

    MSUE: Multi-Modal Soccer Understanding Expert

    This paper presents our solution to the 2026 SoccerNet VQA Challenge. We first develop a cost-effective data synthesis pipeline driven by a Vision-Language Model (VLM), which systematically restructures raw domain data into diverse VQA samples, including concise answers and long-…