arXiv:2603.21437v2 Announce Type: replace Abstract: Transformer-based embedding models frequently exhibit geometric pathologies, such as anisotropy and length-induced representation collapse, which can degrade downstream retrieval performance. While prior work often attributes these issues directly to text length or attention mechanisms, we argue that the fundamental drivers are instead the inherent pooling operations coupled with internal semantic shift. In this paper, we establish a unified theoretical framework proving that contextual pooling intrinsically causes embedding collapse. Specifi
Source: arXiv cs.CL — read the full report at the original publisher.
