
arXiv:2507.11768v3 Announce Type: replace-cross Abstract: Bayesian accounts of in-context learning face a direct objection: exact posterior predictives for exchangeable data are invariant to task-preserving order, yet transformers change next-token probabilities when the same examples are serialized differently. We show this objection targets a structural invariant rather than the quantity scoring online prediction. For any Bayesian reference, excess prequential code length is exactly cumulative predictive KL. For unordered support sets that must be serialized, the expected regret of a single
Ongoing research into the theoretical underpinnings of large language models is actively refining our understanding of their capabilities and limitations.
This paper clarifies a fundamental debate regarding the Bayesian nature of LLMs, impacting how we design, interpret, and trust these increasingly powerful AI systems.
Our theoretical understanding of LLM inference is evolving, differentiating between ideal Bayesian behavior and real-world transformer realization, which could guide future architectural improvements.
- · AI researchers
- · ML architects
- · Bayesian learning practitioners
- · Oversimplified interpretations of LLM learning
It provides a more nuanced theoretical framework for understanding the internal workings of large language models.
This improved understanding could lead to more robust and explainable AI models, potentially reducing unexpected behaviors.
Long-term, a stronger theoretical foundation might accelerate progress toward true artificial general intelligence by clarifying learning mechanisms.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG