SIGNALAI·May 26, 2026, 4:00 AMSignal75Short term

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

arXiv:2605.23975v1 Announce Type: new Abstract: Audio large language models (Audio LLMs) exhibit systematic failures in transcribing code-switching speech despite strong multilingual capabilities. Focusing on English-Mandarin, we identify three failure modes: language omission, translation-instead-of-transcription, and hallucination. We apply Direct Preference Optimization (DPO) to align models, constructing preference pairs in which chosen responses preserve mixed-language content while rejected responses mimic failure patterns. Training three Audio LLMs on 100K pairs (570 hours), we observe

Why this matters

Why now

The rapid advancement and deployment of Audio LLMs are exposing their limitations in complex linguistic tasks like code-switching, necessitating immediate solutions for real-world application.

Why it’s important

Improving code-switching ability in Audio LLMs is critical for expanding their utility and accuracy in multilingual societies, reducing friction in human-computer interaction across diverse linguistic contexts.

What changes

Audio LLMs will become more effective at understanding and transcribing mixed-language speech, moving beyond current failure modes like omission and hallucination, especially in common language pairs like English-Mandarin.

Winners

· AI developers
· Multilingual users
· Speech recognition companies
· Global enterprise

Losers

· Monolingual speech recognition solutions

Second-order effects

Direct

Audio LLMs will exhibit significantly improved accuracy in transcribing code-switched conversations, making them more reliable for transcription and understanding.

Second

The enhanced multilingual capabilities of these models will accelerate their adoption in customer service, legal, medical, and educational settings in diverse linguistic markets.

Third

This improvement could reduce language barriers in digital communication and services, potentially fostering greater cross-cultural collaboration and access to information.

Editorial confidence: 90 / 100 · Structural impact: 60 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.CL

#cs.CL #cs.SD

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.