New UrduMMLU benchmark reveals LLM knowledge gaps

By PulseAugur Editorial · [2 sources] · 2026-06-05 11:35

Researchers have developed UrduMMLU, a new benchmark designed to evaluate the understanding of Urdu language in large language models. This benchmark consists of over 26,000 multiple-choice questions across 26 subjects, sourced from native educational materials. Evaluations show that Gemini-3.5-Flash leads in performance, but many other models, particularly open-source ones, exhibit significant knowledge gaps, especially in humanities and region-specific content. AI

IMPACT Highlights uneven Urdu language understanding in LLMs, particularly for region-specific content, guiding future model development.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating LLMs.

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Ahmer Tabassum, Sarfraz Ahmad, Hasan Iqbal, Owais Aijaz, Momina Ahsan, Preslav Nakov · 2026-06-08 04:00

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

arXiv:2606.07167v1 Announce Type: cross Abstract: Meaningful multilingual evaluation must test models in the target language and educational context. Urdu, spoken by more than 230 million people, lacks a broad MMLU-style benchmark built from native educational sources. We introdu…
arXiv cs.CL TIER_1 English(EN) · Preslav Nakov · 2026-06-05 11:35

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

Meaningful multilingual evaluation must test models in the target language and educational context. Urdu, spoken by more than 230 million people, lacks a broad MMLU-style benchmark built from native educational sources. We introduce UrduMMLU, a benchmark of 26,431 Urdu MCQs acros…

COVERAGE [2]

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

RELATED ENTITIES

RELATED TOPICS