ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refinement for Joint Multi-Resolution Speech Quality Modeling

arXiv:2606.10233v1 Announce Type: cross Abstract: While speech quality is typically assessed on complete utterances, streaming and generative systems require incremental estimation from partial audio. Existing predictors assume full context, degrading on prefix-constrained inputs. Extending ARECHO, we propose ANCHOR, reformulating incremental assessment as a multi-resolution autoregressive task. It models chunk- and utterance-level quality within a single decoder using dual-resolution tokens and a resolution-aware hierarchy for coarse-to-fine refinement. Experiments show substantial robustness
The continuous improvement in AI models for speech processing necessitates more efficient and accurate real-time quality assessment, addressing limitations of existing full-context methods.
This development could enable more reliable and dynamic quality monitoring for real-time AI agents and streaming generative AI, critical for user experience and system performance.
The ability to incrementally and accurately assess speech quality will improve the robustness of streaming AI applications and potentially open new avenues for adaptive speech generation and processing.
- · AI speech processing companies
- · Real-time communication platforms
- · Generative AI developers
- · Systems reliant on batch speech quality assessment
- · AI models without incremental evaluation capabilities
Improved performance and user satisfaction in applications like live translation, voice assistants, and AI-generated audio.
Faster iteration and deployment cycles for new speech-based AI features due to more responsive quality feedback.
The integration of such quality metrics directly into AI model training loops, leading to self-optimizing speech models in real-time environments.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG