arXiv:2605.27476v1 Announce Type: new Abstract: We characterize the pre-softmax attention matrix $\mathbf{QK^\top}$ in transformers as an associative memory matrix encoding pairwise associations between input features. By decomposing this matrix into its symmetric and skew-symmetric parts, we interpret the symmetric component as governing the structure of the energy landscape, and the skew-symmetric component as driving circulation on that landscape. Leveraging the energy formulation induced by the symmetric component, we derive Hopfield-style stability measures that quantify the stability of
Source: arXiv cs.LG — read the full report at the original publisher.
