arXiv:2605.23033v1 Announce Type: new Abstract: Foundational Models pretrained on huge amount of data learn representations that evolve across depth, forming a hierarchy of embeddings with distinct semantic content and geometric structure. Contrary to the widespread practice of using only the final layer or shallow mixtures, we show that task-relevant information is distributed non-monotonically across layers and cannot be recovered by na\"ive aggregation. Through a geometric and empirical study across multiple modalities, we show that effective transfer depends on identifying which layers enc
Source: arXiv cs.LG — read the full report at the original publisher.
