New benchmark reveals Vision-Language Models struggle with script consistency

By PulseAugur Editorial · [2 sources] · 2026-06-15 18:25

A new benchmark, PuMVR, has been developed to evaluate Vision-Language Models (VLMs) on their ability to handle multiple scripts within a single language. The benchmark, comprising 1,000 parallel image-text instances across Punjabi's Gurmukhi, Shahmukhi, and Roman scripts, reveals a significant 'Script Gap' in 10 state-of-the-art VLMs. These models often perform well in one script but fail in others, with accuracy differences up to 16%. The research proposes the Script Consistency Rate (SCR) as a crucial metric for evaluating script-agnostic VLM performance and ensuring equitable AI access. AI

IMPACT Highlights a critical limitation in current multilingual VLMs, potentially driving development of more script-agnostic AI systems.

RANK_REASON The cluster contains an academic paper introducing a new benchmark and evaluation methodology for AI models.

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New benchmark reveals Vision-Language Models struggle with script consistency

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Prabhjot Singh, Bhushan Pawar, Madhu Reddiboina, Rajvee Sheth · 2026-06-17 04:00

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

arXiv:2606.17188v1 Announce Type: cross Abstract: Current multilingual evaluations for Vision-Language Models (VLMs) assume a one-to-one mapping between language and orthography, overlooking billions of users of multi-script languages. We introduce PuMVR (Punjabi Multimodal Visua…
arXiv cs.CL TIER_1 English(EN) · Rajvee Sheth · 2026-06-15 18:25

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

Current multilingual evaluations for Vision-Language Models (VLMs) assume a one-to-one mapping between language and orthography, overlooking billions of users of multi-script languages. We introduce PuMVR (Punjabi Multimodal Visual Reasoning), a benchmark of 1,000 strictly parall…

COVERAGE [2]

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

RELATED ENTITIES

RELATED TOPICS