The Cross-Architecture Substrate: A Domain-Transcendent, Calibration-Surviving Geometric Invariant of Modern Vision Encoders

arXiv:2606.07882v1 Announce Type: cross Abstract: Different vision neural networks -- trained to classify, contrast, reconstruct, or match images to text -- should have correspondingly different internal representations. We report that they do not. After training, the top sixteen principal directions of variation inside thirteen modern vision encoders converge to the same sixteen-dimensional geometric object. We call this the cross-architecture substrate and study it with PCA, centred kernel alignment (CKA), and Pang 2026 calibration. The substrate transports across four visual domains (natura
The proliferation and diversity of advanced vision models allow for comparative analysis into their internal representations, revealing common underlying structures.
This finding suggests a fundamental, shared mechanism across diverse vision AI architectures and tasks, which could lead to more robust, generalizable, and efficient AI systems.
Our understanding of AI internal cognition expands, potentially shifting AI development from bespoke model engineering towards discovering and leveraging universal representational substrates.
- · AI researchers (fundamental)
- · AI developers (generalization)
- · Hardware manufacturers (efficiency gains)
- · Over-specialized AI model developers
Identification of a common 'substrate' across varied vision encoders, suggesting a foundational shared learning mechanism.
This shared substrate could enable the development of more efficient multi-task or general-purpose AI architectures.
A deeper understanding of this substrate might lead to novel methods for AI calibration, interpretability, and potentially, a theory of vision intelligence.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI