Benchmarking Google Embeddings 2 against Open-Source Models for Multilingual Dense Retrieval and RAG Systems

arXiv:2605.23618v1 Announce Type: new Abstract: We benchmark Google Embeddings (GE2), a Vertex-AI-hosted bi-encoder with 2,048-token context and explicit task-type conditioning, against five open-source alternatives: BGE-M3, E5-large, Multilingual-E5-large (mE5-L), LaBSE, and Paraphrase-Multilingual-MPNet (mMPNet). Evaluation covers four BEIR subsets, a synthetic Italian RAG corpus, a chunking ablation considering 5 sizes of tokens with three strategies, and per-query latency on commodity CPU hardware. GE2 ranks first on every task, achieving BEIR avg.nDCG@10 = 0.638 and IT-RAG-Bench nDCG@10 =
The paper benchmarks Google Embeddings 2 shortly after its release, reflecting ongoing advancements and competition in the AI embedding space.
Google's leadership in dense retrieval benchmarks demonstrates its continued strength in foundational AI models, critical for RAG systems and other applications.
Google Embeddings 2 establishes a new performance benchmark for multilingual dense retrieval, potentially influencing industry adoption and open-source development priorities.
- · Vertex-AI users
- · Enterprises using RAG systems
- · Open-source embedding models
- · Competitors without equivalent proprietary models
Increased adoption of Google Embeddings 2 for enterprise RAG implementations.
Open-source model developers will intensify efforts to close the performance gap, potentially leading to rapid innovation.
Heightened competition for AI talent in NLP and embedding research as companies strive for market leadership.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL