PulseAugur
LIVE 06:04:51
research · [2 sources] ·
0
research

RaguTeam wins SemEval-2026 LLM task with judge-orchestrated ensemble

RaguTeam has developed a winning system for the SemEval-2026 Task 8, which focuses on faithful multi-turn response generation. Their approach utilizes a heterogeneous ensemble of seven large language models, with a GPT-4o-mini acting as a judge to select the best response. This ensemble method outperformed 26 other teams, achieving a harmonic mean of 0.7827 and demonstrating the effectiveness of diverse model families and prompting strategies. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Demonstrates an effective ensemble strategy for multi-turn response generation, potentially influencing future research in faithful dialogue systems.

RANK_REASON This is a research paper detailing a system's performance in a specific academic task.

Read on arXiv cs.LG →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 · Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov ·

    RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

    arXiv:2605.04523v1 Announce Type: cross Abstract: We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects t…

  2. arXiv cs.CL TIER_1 · Mikhail Kulakov ·

    RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

    We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out …