The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

arXiv:2606.11198v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) systems inject external knowledge to improve LLM outputs, yet the format of injected content -- distinct from its semantic relevance -- can independently distort the model's attention distribution. We identify and formalise a phenomenon we term the structural attention tax: knowledge graph (KG) triples, due to their relational delimiters and repeated slot patterns, capture 2-3x more attention per token than semantically equivalent natural-language text ($\hat{o}$(KG) $\approx$ 0.70 vs. $\hat{o}$(neutral) $\app
The paper highlights a critical issue in how Retrieval-Augmented Generation (RAG) systems process information, directly impacting the efficacy and trustworthiness of current LLM applications.
This research reveals a fundamental limitation in LLMs' ability to properly weigh retrieved information based on content alone, potentially leading to misinterpretations and biased outputs.
Understanding the 'structural attention tax' means that the format of injected knowledge is as crucial as its content, requiring new approaches to RAG system design and data preparation.
- · AI researchers in RAG optimization
- · Companies developing advanced RAG interfaces
- · Developers of new knowledge representation formats
- · Current RAG implementations over-relying on knowledge graphs
- · Organizations using raw, untuned knowledge sources for RAG
- · LLM application developers without careful data engineering
Immediate re-evaluation of RAG implementation strategies, particularly regarding knowledge graph integration, to mitigate attention tax.
Development of new retrieval and injection formats that optimize for attention distribution, potentially favoring natural language or hybrid representations.
Enhanced trust in LLM outputs as RAG systems become more robust to format-induced biases, leading to broader enterprise adoption for critical applications.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL