Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m

arXiv:2605.24577v1 Announce Type: cross Abstract: Independently trained transformers compute the same function in residual-stream bases that differ by a uniform random rotation on $\mathrm{SO}(d_{\mathrm{model}})$. We call this phenomenon polymorphism: same function, mutually unintelligible interior coordinates. One matrix multiplication per model pair removes it: an orthogonal Procrustes fit on a single batch of activations transfers sparse-autoencoder feature dictionaries and steering vectors between independently trained models, with no retraining. The phenomenon is invisible to the standar
This research emerges as the AI community increasingly focuses on interpretability and transfer learning between large language models, driven by the need for more efficient and generalizable AI systems.
A strategic reader should care because improving mechanistic interpretability and transferability between models can significantly accelerate AI development, reduce training costs, and enhance the robustness and safety of AI applications.
The ability to transfer insights like feature dictionaries and steering vectors directly between independently trained models without retraining, by just correcting for 'polymorphism,' fundamentally changes how AI models can be analyzed, understood, and integrated.
- · AI researchers
- · AI developers
- · AI safety organizations
- · Cloud compute providers
- · AI model retraining costs
- · Overly specialized AI development workflows
- · Lack of transparency in AI models
This research enables a more efficient transfer of interpretability tools and findings between diverse AI models.
It could lead to a 'parts library' for AI models, where functional components are understood and transferable, accelerating AI innovation.
This might enable breakthroughs in understanding and controlling emergent AI behaviors across different model architectures, leading to more robust and ethical AI systems.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL