
arXiv:2606.02548v1 Announce Type: new Abstract: Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual settings where ASR models may emit romanized text. We propose Script-Normalized WER (SN-WER), a training-free, evaluation-only scoring method that transliterates both reference and hypothesis text into a language-specific canonical script before computing WER. We evaluate SN-WER on 5 Indic languages, 2 datasets, and 3 ASR
The proliferation of multilingual ASR systems, particularly for languages with multiple script representations like Indic languages, makes accurate and fair evaluation metrics critically important now.
This development improves the reliability of ASR evaluation, which is vital for developing robust AI models capable of handling linguistic diversity and for benchmarking progress in multilingual AI.
ASR evaluation for multi-script languages can now more accurately assess model performance by normalizing scripts, reducing the overestimation of errors caused by script variation rather than actual speech recognition inaccuracies.
- · Multilingual ASR developers
- · Users of Indic language voice interfaces
- · AI researchers in speech recognition
- · Language technology companies
- · ASR models with poor multi-script handling
Improved ASR models for Indic and other multi-script languages.
Increased adoption and utility of voice interfaces in diverse linguistic regions.
Enhanced AI accessibility and inclusion for a broader global population, potentially contributing to 'sovereign AI' efforts through more accurate domestic language models.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL