Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

arXiv:2604.21481v2 Announce Type: replace Abstract: Crowdsourced pairwise evaluation has emerged as a scalable approach for assessing foundation models. However, applying it to Text to Speech(TTS) introduces high variance due to linguistic diversity and multidimensional nature of speech perception. We present a controlled multidimensional pairwise evaluation framework for multilingual TTS that combines linguistic control with perceptually grounded annotation. Using 5K+ native and code-mixed sentences across 10 Indic languages, we evaluate 7 state-of-the-art TTS systems and collect over 120K pa
The proliferation of foundational AI models necessitates robust evaluation methods, and India's linguistic diversity makes it a crucial testbed for voice AI development, aligning with its push for digital sovereignty.
This research provides a scalable, multidimensional evaluation framework for TTS, critical for developing high-quality voice AI in linguistically diverse regions, directly supporting efforts to build indigenous AI capabilities.
The emphasis on linguistically controlled, multidimensional evaluation for TTS in Indian languages highlights a growing sophistication in AI development tailored for specific regional needs, moving beyond Western-centric evaluations.
- · Indian AI companies
- · Multilingual voice AI developers
- · Indian government
- · Users of Indian languages
- · Monolingual AI developers
- · Unoptimized global TTS models
Improved quality and adoption of AI-powered voice interfaces in Indic languages.
Accelerated development of localized AI models and services, reducing reliance on global, often less-performant, solutions for these languages.
Enhanced data sovereignty and digital independence for nations with linguistic diversity, fostering unique AI ecosystems.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL