
arXiv:2605.23912v1 Announce Type: new Abstract: We present Raon-Speech, a top-performing 9B-parameter speech language model (SpeechLM) for English and Korean speech understanding, answering, and generation, and Raon-SpeechChat, a high-performing full-duplex extension for natural real-time conversation. Raon-Speech successfully transforms a pre-trained LLM into a SpeechLM that both understands and generates speech while preserving strong text capabilities. It trains on 1.38M hours of highly curated English and Korean speech and text datasets with the following training stages: (1) speech module
The development of Raon-Speech reflects the ongoing rapid advancements in multimodal AI, specifically in bridging large language models (LLMs) with speech capabilities, and the increasing demand for real-time conversational AI.
This breakthrough demonstrates significant progress in creating more versatile and natural AI interfaces that can understand, generate, and converse in speech, expanding AI's application across industries and user experiences.
The ability to transform a pre-trained LLM into a high-performing SpeechLM for understanding, answering, and generating speech, while maintaining strong text capabilities, makes AI interaction more seamless and expands the scope of AI applications.
- · AI developers
- · Customer service sector
- · Voice assistant developers
- · Multilingual communication platforms
- · Traditional speech-to-text providers
- · Monolingual AI solutions
Further integration of advanced speech AI into everyday devices and services becomes more viable.
Reduced friction in human-computer interaction leads to accelerated adoption of AI in diverse workflows.
The proliferation of context-aware, full-duplex conversational AI systems could fundamentally alter communication norms and expectations across various industries.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL