
arXiv:2510.27118v4 Announce Type: replace Abstract: Most expressivity results for transformers treat them as language recognizers -- devices that accept or reject strings -- rather than as they are used in practice: as language models that generate strings autoregressively and probabilistically. We characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. O
This research is emerging as AI language models, particularly transformers, become central to numerous applications, demanding a deeper theoretical understanding of their capabilities and limitations.
A more precise understanding of the probability distributions transformers can compute directly impacts the development of more reliable, expressive, and predictable AI models, influencing model design and safety.
This research refines our theoretical understanding of transformer expressivity, moving beyond simple language recognition to their probabilistic and autoregressive nature, highlighting new expressivity implications when these features are considered.
- · AI researchers
- · AI model developers
- · NLP applications
- · Developers relying on simplistic transformer assumptions
Improved theoretical foundations for transformer-based AI models.
Development of transformers with more predictable probabilistic outputs, leading to more robust and less 'hallucinatory' AI systems.
Potential for new transformer architectures optimized for specific probabilistic tasks, expanding the range of AI applications.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL