Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

arXiv:2605.13087v2 Announce Type: replace-cross Abstract: Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and Malayalam across four tiers: studio, broadcast, spontaneous, and synthetic noise. Through a controlled study of learning-rate timing and curriculum ordering, we find that early large parameter updates improve global WER by 12 absolute points, while a hard-to-easy curriculum adds gains for spont
The proliferation of ASR models and their application to diverse linguistic contexts necessitates deeper experimentation into optimization for real-world scenarios, particularly for low-resource languages.
This research provides a framework and empirical findings for more robust and accurate speech recognition in Indic languages, directly impacting the accessibility and utility of AI for a significant portion of the global population.
The understanding of how to fine-tune multilingual ASR models for spontaneous speech in low-resource languages is improved, potentially leading to more effective deployment of AI-powered voice assistants and transcription services.
- · AI developers focused on Indic languages
- · Users of voice AI in India and South Asia
- · Speech recognition technology companies
- · Research institutions in AI/ML
- · Generic multilingual ASR models without specific optimization
Improved performance of ASR systems for Hindi and Malayalam, especially in real-world conversational settings.
Accelerated adoption of voice interfaces and AI-driven services in regions where these languages are dominant due to enhanced reliability.
Reduced digital divide for non-English speaking populations, fostering greater participation in the global digital economy through improved language technology.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI