Same Concept, Different Directions: Cross-Modal Feature Heterogeneity in Sparse Autoencoders

arXiv:2606.29888v1 Announce Type: new Abstract: Vision-language models map images and text into a joint embedding space. However, these embeddings often entangle multiple semantic features, which limits their interpretability and controllability. While sparse autoencoders have emerged as a useful tool for decomposing these embeddings into monosemantic features, their application to joint embedding spaces has largely relied on an implicit, untested assumption that semantically corresponding features share the same directions across modalities. In this paper, we challenge this assumption by iden
This paper addresses a fundamental assumption in current vision-language models and sparse autoencoders, published as research pushes the boundaries of AI interpretability and controllability.
Understanding the heterogeneity of features across modalities could significantly improve the interpretability, robustness, and performance of future AI models, impacting diverse applications.
The explicit challenge to the assumption of shared directional features across modalities suggests a new avenue for developing more sophisticated and potentially more effective AI architectures for multimodal learning.
- · AI researchers
- · Developers of multimodal AI applications
- · Companies seeking explainable AI
- · Developers relying solely on current implicit assumptions
- · Applications with poor interpretability
Improved methods for disentangling semantic features in joint embedding spaces will emerge.
More robust and explainable multimodal AI systems could accelerate adoption in sensitive sectors like healthcare and finance.
A deeper understanding of cross-modal feature representation may lead to more human-like cognitive architectures in AI.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG