A Systems-Level Analysis of Sensitivity, Robustness, and Stability in Retrieval-Augmented Generation

arXiv:2606.28337v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems are often evaluated using final answer accuracy, even though their failures can originate from preprocessing, retrieval, context packing, or generation. This paper presents a controlled empirical study of RAG sensitivity, robustness, and stability across 56 experimental runs. We evaluate how chunk size, retrieval depth (top k), embedding-based reranking, probabilistic retrieval noise, and repeated seeded runs affect retrieval, context packing, and generation behavior. Using a fixed 500-question QA su
The rapid deployment and evolving complexity of Retrieval-Augmented Generation (RAG) systems necessitate a deeper, systems-level understanding of their operational vulnerabilities as they move from research to critical applications.
This paper highlights fundamental challenges in RAG system reliability and performance, moving beyond simplistic accuracy metrics to examine sensitivity, robustness, and stability across various configurations, which is crucial for their trustworthy and scalable deployment.
The focus for RAG development will likely shift towards more rigorous, multi-faceted evaluation methodologies that account for nuanced failure points, rather than solely relying on end-to-end accuracy, thus influencing design and testing protocols.
- · AI researchers focusing on explainability and robustness
- · Enterprises deploying mission-critical AI
- · Tools for RAG evaluation and monitoring
- · Developers relying solely on 'final answer accuracy'
- · Simplistic RAG deployment strategies
Improved understanding of RAG failure modes leads to more reliable and predictable AI system development.
Increased demand for specialized tooling and expertise in RAG system diagnostics and optimization, fostering a new sub-sector within AI engineering.
Enhanced trust and broader adoption of AI agents in sensitive domains, as their underlying RAG components become demonstrably more resilient.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI