A new benchmark, PuMVR, has been developed to evaluate Vision-Language Models (VLMs) on their ability to handle multiple scripts within a single language. The benchmark, comprising 1,000 parallel image-text instances across Punjabi's Gurmukhi, Shahmukhi, and Roman scripts, reveals a significant 'Script Gap' in 10 state-of-the-art VLMs. These models often perform well in one script but fail in others, with accuracy differences up to 16%. The research proposes the Script Consistency Rate (SCR) as a crucial metric for evaluating script-agnostic VLM performance and ensuring equitable AI access. AI
IMPACT Highlights a critical limitation in current multilingual VLMs, potentially driving development of more script-agnostic AI systems.
RANK_REASON The cluster contains an academic paper introducing a new benchmark and evaluation methodology for AI models.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →