AI models struggle with Devanagari script OCR, new benchmark reveals

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

A new benchmark study has evaluated the performance of ten OCR systems, including specialized OCR-VLMs and frontier multimodal LLMs, on Devanagari script. The research found that while many systems perform well on clean synthetic text, their performance degrades significantly under degradation conditions and on real-world scans. Specialized OCR-VLMs proved particularly fragile, with DeepSeek-OCR exhibiting catastrophic repetition failures. Notably, strong performance on English OCR did not correlate with performance on Indic scripts, with models like GPT-5.5 showing a substantial drop. AI

IMPACT Highlights limitations of current multimodal models on non-English scripts, indicating a need for improved multilingual capabilities and robustness.

RANK_REASON Academic paper presenting a new benchmark and study on OCR performance for a specific script. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AI models struggle with Devanagari script OCR, new benchmark reveals

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Aditya Pratap Singh · 2026-06-30 04:00

Can OCR-VLMs Read Devanagari? A Stress-Test Benchmark and Post-Correction Study

arXiv:2606.29213v1 Announce Type: new Abstract: OCR systems, ranging from classical engines to specialised OCR vision-language models (OCR-VLMs) and frontier multimodal LLMs, report strong results on English and Chinese document benchmarks, yet their behaviour on Indic scripts is…

COVERAGE [1]

Can OCR-VLMs Read Devanagari? A Stress-Test Benchmark and Post-Correction Study

RELATED ENTITIES

RELATED TOPICS