Researchers have introduced MuDABench, a new benchmark designed for analytical question answering across large collections of documents. This benchmark requires systems to synthesize information from numerous sources to perform quantitative analysis, a task that current retrieval-augmented generation (RAG) systems struggle with. A proposed multi-agent workflow shows improvement but still falls short of human expert performance, highlighting challenges in information extraction and domain-specific knowledge. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Highlights limitations in current RAG systems for complex analytical QA, suggesting areas for future research and development.
RANK_REASON This is a research paper introducing a new benchmark for a specific AI task.