When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

arXiv:2606.13537v1 Announce Type: new Abstract: While mixed-language querying is ubiquitous in multilingual communities, the sensitivity of dense retrievers to such queries remains poorly understood. We present a ratio-controlled study on mMARCO that systematically evaluates retrieval performance by varying the mixing proportion of parallel query translations via embedding-level mixing -- constructing mixed queries as an interpolation of monolingual embeddings. Experiments with BGE-M3 demonstrate that an optimal mixing ratio outperforms the best monolingual endpoint in 88/105 cases. We uncover
The proliferation of multilingual AI models and global information access makes understanding mixed-language query performance increasingly critical for AI developers.
Improving multilingual dense retrieval directly enhances the utility and accessibility of AI systems for non-English speakers, broadening AI's global impact and market.
Optimized query embedding interpolation suggests a robust method to significantly improve retrieval accuracy for mixed-language queries, leading to more effective multilingual AI applications.
- · Multilingual AI users
- · AI product developers
- · Global information platforms
- · Monolingual AI systems
Increased effectiveness and adoption of AI services in non-English speaking markets.
Reduced language barriers for information access and knowledge sharing globally, potentially accelerating innovation.
Enhanced competition among AI providers to offer superior multilingual capabilities, driving further research and development.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL