Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

arXiv:2605.27383v1 Announce Type: cross Abstract: Spoken Language Models (SLMs) have emerged as a promising paradigm for speech synthesis by bypassing explicit grapheme-to-phoneme pipelines. However, their effectiveness in low-resource languages remains fundamentally limited by the scarcity of transcribed speech. In practice, synthetic data has become the primary strategy for scaling SLMs in such settings, providing reliable phonetic supervision when real data is insufficient. In this work, we show that this reliance introduces a fundamental trade-off, which we term the Stability-Expressivity
The paper identifies a fundamental trade-off in the prevalent method of scaling Spoken Language Models (SLMs) for low-resource languages, proposing methods to mitigate immediate limitations.
This research addresses a critical technical hurdle in expanding advanced AI capabilities to a broader global linguistic base, impacting accessibility and the equity of AI development.
The understanding and development of SLMs for low-resource languages shift towards more nuanced strategies that balance stability and expressivity when using synthetic data.
- · AI researchers in speech technology
- · Developers of SLMs for diverse languages
- · Populations speaking low-resource languages
- · AI development approaches that over-rely on simple synthetic data scaling
- · Monolingual AI services
- · Purely data-scarce language communities unable to generate synthetic data
Improved performance and broader applicability of Spoken Language Models in a wider array of languages.
Accelerated development of AI tools and services tailored for historically underserved linguistic communities.
Enhanced digital inclusion and reduced language barriers in accessing cutting-edge AI technologies and information.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI