Contrastive Training with LLM-generated Near-Misses for Robust Code-Switching Speech Recognition

arXiv:2606.06985v1 Announce Type: new Abstract: Code-switching (CS), the alternation between multiple languages within a single utterance, remains challenging for Automatic Speech Recognition (ASR). To address this issue, we propose a Point-of-Interest (POI)-aware contrastive training framework that improves recognition at CS-critical regions. We first identify CS spans by adopting POI detection method from literature, then construct acoustically plausible near-miss hypotheses by perturbing POIs in ASR N-best outputs and expanding candidates with a large language model. Hard but plausible nega
The increasing sophistication of Large Language Models (LLMs) and their integration with other AI techniques allows for novel approaches to complex speech recognition challenges like code-switching.
Improving code-switching ASR is crucial for seamless human-computer interaction in multilingual societies and for expanding AI accessibility and utility globally.
ASR systems will become significantly more robust in handling mixed-language input, leading to more accurate and reliable transcription and voice interfaces for diverse user groups.
- · ASR developers
- · Multilingual users
- · AI service providers
- · Global tech companies
- · Legacy ASR systems
Increased accuracy in code-switching speech recognition will lead to wider adoption of voice-controlled interfaces in multilingual contexts.
Enhanced ASR capabilities will enable more effective data analysis and insights from multilingual audio content, impacting sectors like customer service and intelligence.
This technological advancement could indirectly accelerate the development of more sophisticated and inclusive AI agents capable of understanding and engaging with a broader human linguistic spectrum.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL