Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

arXiv:2605.29084v1 Announce Type: cross Abstract: A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same question depending on which source it retrieves -- a failure mode the dominant single-gold-answer paradigm cannot diagnose. We argue that source-dependence is a missing axis of NLP evaluation, and that auditing it means shifting the unit of evaluation from answer correctness to the inter-source relationship. We make this concrete in transplant patient education, where institutional sources demonstrably disagree
This research highlights a growing practical concern as RAG systems are integrated into critical domains like healthcare, where source credibility and answer consistency are paramount.
It exposes a fundamental challenge for the deployment of reliable AI systems in multi-source, high-stakes environments, demanding a re-evaluation of current NLP evaluation paradigms.
The focus expands from mere answer correctness to the inter-source relationship and the auditability of source-dependence, complicating current RAG deployment strategies.
- · AI auditing and evaluation tool developers
- · Organizations with well-curated, conflict-free internal data
- · Specialized RAG system developers for critical applications
- · Developers of RAG systems relying solely on single-gold-answer metrics
- · Healthcare providers deploying un-audited multi-source RAG
- · Sectors with highly conflicting or opinionated information sources
This paper will likely spur new research and development in RAG evaluation metrics and auditing tools.
It could lead to regulatory pressure or industry standards for transparency and auditability in RAG systems, especially in sensitive applications.
Organizations may consolidate data sources or invest heavily in data harmonization to mitigate source-dependence issues, impacting data governance strategies.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI