
arXiv:2606.15054v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) detect features via inner product, so a feature's activation scales with both its directional alignment and the input's norm. Under BatchTopK, high-norm tokens inflate all pre-activations simultaneously, claiming dictionary slots regardless of content alignment. This matters because sublayer normalization has already discarded the magnitude the score measures, so the encoder detects a quantity the model does not read. We replace the score with a learned blend of cosine similarity and input magnitude, letting the optimiz
The paper addresses an identified limitation in current sparse autoencoder (SAE) architectures that impacts their feature detection capabilities, indicating a current push for more efficient and robust AI models.
This research directly improves the efficiency and effectiveness of Sparse Autoencoders, a foundational component in many advanced AI systems, potentially leading to more interpretable and resource-efficient large language models.
By replacing inner product scoring with a learned blend of cosine similarity and input magnitude, SAEs can more accurately detect features, eliminating an unwanted sensitivity to input norm that previously led to inefficient dictionary slot allocation.
- · AI researchers
- · Open-source AI community
- · Companies using large language models
- · Developers of interpretable AI systems
- · Inefficient SAE models
- · Researchers relying on older SAE scoring mechanisms
Improved performance and interpretability of sparse autoencoders across various AI applications.
Accelerated development of more robust and less 'hallucinating' large language models due to better feature disentanglement.
Enhanced AI safety and auditability as the interpretability of complex neural networks improves through more precise feature detection.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG