
arXiv:2606.29213v1 Announce Type: new Abstract: OCR systems, ranging from classical engines to specialised OCR vision-language models (OCR-VLMs) and frontier multimodal LLMs, report strong results on English and Chinese document benchmarks, yet their behaviour on Indic scripts is largely uncharacterised. We benchmark ten systems on Devanagari (Hindi): classical EasyOCR; open VLMs (Qwen2.5-VL-3B, Qwen3-VL-8B, olmOCR-7B); specialised OCR-VLMs (DeepSeek-OCR, Unlimited-OCR); and frontier closed models (Gemini 2.5 Flash, Claude Opus 4.7, GPT-5.5, Mistral OCR), across four synthetic degradation cond
The proliferation of advanced AI models has highlighted their limitations beyond dominant languages, creating an urgent need for benchmarks and improvements in underserved linguistic contexts like Indic scripts.
This research provides critical insights into the performance gap of leading AI models on non-English languages, underscoring the necessity for more inclusive AI development and assessing the accessibility of advanced AI globally.
The focus expands from primarily English and Chinese benchmarks to include rigorous testing and post-correction studies for Indic scripts, revealing specific weaknesses and opportunities for 'OCR-VLMs'.
- · Indic language users
- · Developers of multilingual AI
- · Companies specializing in regional AI solutions
- · Academic researchers in NLP/OCR
- · Monolingual English/Chinese AI models
- · Outdated OCR technologies
- · Companies relying solely on generic AI solutions for global markets
Improved OCR and VLM performance for Devanagari and other Indic scripts will become a key competitive differentiator.
Enhanced accessibility and utility of AI for a significant portion of the global population, fostering local innovation and digital inclusion.
The development of regionally optimized AI models could lead to a less centralized AI ecosystem, potentially impacting global technology hubs and fostering sovereign AI initiatives.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL