Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

arXiv:2606.26618v1 Announce Type: new Abstract: Large pretrained text-to-speech (TTS) models sound almost human for well-resourced languages, but much worse for languages that are rare in their training data. We study this quality gap for Khmer and Korean using VoxCPM2, a 2.4B-parameter, tokenizer-free TTS model that joins a MiniCPM-4 language-model backbone with a flow-matching diffusion decoder. We build one shared, language-tagged corpus of about 26 hours and adapt VoxCPM2 with a single Low-Rank Adaptation (LoRA) adapter, trained on both languages at once and added to both the language mode
The proliferation of large, pretrained AI models highlights disparities in their performance across languages, driving efforts to bridge this 'quality gap' for less resourced languages.
Improving low-resource language support in TTS models expands AI's utility and accessibility globally, impacting communication, education, and digital inclusion, potentially reducing AI dependency on hegemon languages.
Local language AI applications become more viable and higher quality, potentially fostering domestic AI development and reducing the digital divide for underserved linguistic groups.
- · AI developers focused on low-resource languages
- · Users of languages like Khmer and Korean
- · Governments promoting linguistic diversity in tech
- · Companies offering localized AI services
- · Monolingual AI content providers
- · AI models without effective adaptation mechanisms
High-quality text-to-speech becomes available for a wider array of languages, directly benefiting localized digital content creation.
This improved accessibility could accelerate the development of domestic AI applications and the digital economies in countries speaking these languages.
It may contribute to a more diversified global AI landscape, reducing the dominance of a few major languages in AI development and application.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL