PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

arXiv:2606.20137v1 Announce Type: cross Abstract: Existing mean opinion score (MOS) prediction models typically predict utterance-level naturalness MOS and can be insensitive to localized pitch-accent errors. We propose Pitch-Accent-focused Speech Quality Assessment (PASQA), which explicitly targets pitch-accent correctness. To train our model, we construct a controlled Japanese accent-error dataset by changing accent patterns using an accent-controllable text-to-speech system, and compute a pseudo accent-quality score from the accent-error rate. PASQA builds on self-supervised representations
The increasing sophistication and widespread deployment of text-to-speech (TTS) systems necessitate more granular and accurate quality assessment methods, moving beyond general naturalness scores to address specific linguistic nuances.
Improving speech quality assessment models, particularly for elements like pitch accent, is crucial for developing more natural and effective AI speech interfaces and for higher-fidelity synthetic media.
The focus of speech quality assessment can now shift from general naturalness to more specific, linguistically nuanced errors, potentially enabling more targeted improvements in synthetic speech generation.
- · AI speech synthesis developers
- · Voice AI companies
- · Users of synthetic speech
- · Companies with low-quality TTS offerings
More accurate and nuanced quality control tools become available for synthetic speech.
This leads to a noticeable improvement in the perceived naturalness and correctness of AI-generated voices, especially in languages with complex accent systems.
The enhanced quality of synthetic speech could accelerate the adoption of voice AI across various sectors, from customer service to content creation.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL