Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 1w · [2 sources]

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

Researchers have developed UrduMMLU, a new benchmark designed to evaluate the understanding of Urdu language in large language models. This benchmark consists of over 26,000 multiple-choice questions across 26 subjects, sourced from native educational materials. Evaluations show that Gemini-3.5-Flash leads in performance, but many other models, particularly open-source ones, exhibit significant knowledge gaps, especially in humanities and region-specific content. AI

IMPACT Highlights uneven Urdu language understanding in LLMs, particularly for region-specific content, guiding future model development.

LLMs
Gemini-3.5-Flash
UrduMMLU