RAG-Pull: Turning Retrieval into a Code-Injection Channel via Invisible Unicode Perturbations

arXiv:2510.11195v2 Announce Type: replace-cross Abstract: Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attac
The rapid advancement and deployment of RAG systems in LLMs create a timely vulnerability for new attack vectors, as security measures often lag behind development.
This attack vector demonstrates a novel method to compromise LLM safety, directly impacting the integrity and trustworthiness of AI systems reliant on external data.
The assumption that RAG inherently improves LLM safety is challenged, as hidden perturbations can turn retrieval into a code-injection channel.
- · Cybersecurity firms
- · AI safety researchers
- · Developers of robust RAG infrastructure
- · Organizations using vulnerable RAG systems
- · Developers of LLMs without robust input sanitization
Increased focus on input sanitization and verification for data used in RAG systems.
Potential for new regulations or industry standards for securing AI systems against such code injection attacks.
Erosion of public trust in AI applications if such vulnerabilities are exploited in high-stakes scenarios.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI