
arXiv:2404.06106v3 Announce Type: replace Abstract: Low dimensional structures appear ubiquitously in the eigenspectra of deep learning matrices in classification networks trained in the overparameterized regime. While theoretical advances have aimed to explain this phenomenology, they typically succeed only in capturing subsets of the full behavior or rely on assumptions that cannot hold in practice. In this work, we provide an analytic explanation for the bulk plus outlier structure of several canonical deep learning matrices, including the Hessian, gradients, and weights. We achieve this us
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG