PulseAugur / Brief
EN
LIVE 15:05:20

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

    Researchers have developed UrduMMLU, a new benchmark designed to evaluate the understanding of Urdu language in large language models. This benchmark consists of over 26,000 multiple-choice questions across 26 subjects, sourced from native educational materials. Evaluations show that Gemini-3.5-Flash leads in performance, but many other models, particularly open-source ones, exhibit significant knowledge gaps, especially in humanities and region-specific content. AI

    IMPACT Highlights uneven Urdu language understanding in LLMs, particularly for region-specific content, guiding future model development.