How Do LLMs Cite? A Mechanistic Interpretation of Attribution in Retrieval-Augmented Generation

arXiv:2606.28358v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) aims to enhance the trustworthiness of Large Language Models (LLMs) by grounding their outputs in external documents, often using inline citations for verifiability. However, the faithfulness of these citations -- whether the model genuinely uses a source to generate an answer -- remains a critical, unverified assumption. This paper offers the first mechanistic account of how a large language model decides whether to attach an inline citation while answering a factoid question. Using the Llama-3.1-8B-Instruc
The proliferation of RAG systems and the increasing scrutiny on LLM trustworthiness necessitate a deeper understanding of their internal attribution mechanisms.
Understanding how LLMs cite is crucial for building more reliable AI systems, enabling verifiability, and mitigating hallucination issues in critical applications.
This research provides a foundational mechanistic interpretation of LLM citation behavior, moving beyond black-box assumptions towards auditable and controllable attribution.
- · AI ethicists
- · RAG system developers
- · Enterprises deploying LLMs
- · Researchers in interpretability
- · Platforms with unfaithful RAG
- · Users relying on blind trust
Increased confidence in information retrieved and cited by RAG-powered LLMs for factoid questions.
Development of more robust and auditable RAG systems with improved control over citation behavior and reduced hallucination rates.
Enhanced regulatory scrutiny and industry standards for AI provenance and attribution, leading to a demand for transparent RAG mechanisms.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI