
arXiv:2607.01960v1 Announce Type: new Abstract: In this paper, we describe NAVER LABS Europe's submission to the instruction-following speech processing short track at IWSLT 2026. We participate again in the constrained setting, developing systems capable of jointly performing ASR, ST, and SQA from English speech into Chinese, Italian, and German. Building on our previous submission, ranked first in last year's short track, we update our multi-stage training pipeline by replacing the speech projector with SpeechMapper, a method for learning a speech-to-LLM embedding projector using only ASR da
The paper describes a significant advancement in instruction-following speech processing, building on previous successes and incorporating new methodologies like SpeechMapper, indicating rapid progress in AI capabilities.
This development showcases enhanced multilingual speech processing and instruction-following, critical for more naturalistic and capable AI agents and broader AI applications across diverse language environments.
The improved system for ASR, ST, and SQA in multiple languages suggests a more sophisticated and efficient approach to building universal speech-to-text and AI interaction systems.
- · NAVER LABS Europe
- · Multilingual AI developers
- · Global AI users
- · Speech processing research
- · Monolingual AI solutions
- · Less efficient speech processing techniques
Improved performance in instruction-following for speech-based AI across multiple languages will be observed.
This will accelerate the adoption of advanced AI agents in diverse linguistic markets, reducing friction in human-AI interaction.
It could lead to the emergence of truly global AI assistants and services, transcending language barriers and fostering greater digital inclusion.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL