Feature Learning in Wide Neural Networks under $\mu$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

arXiv:2605.24710v1 Announce Type: new Abstract: We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($\mu$P). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under $\mu$P, identifying the maximal admissible weight $w^*$ on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation has uniform-in-time squared-Wasserstein rate $O(N^
This research provides a foundational theoretical understanding of feature learning in wide neural networks, a concept central to the performance and scalability of modern AI systems.
A deeper understanding of feature learning mechanisms in neural networks allows for more efficient, predictable, and robust AI model development, potentially accelerating AI progress.
This theoretical advance offers new insights into how wide neural networks learn features, which could inform future architectural designs and training methodologies for more powerful AI.
- · AI researchers
- · Deep learning framework developers
- · Companies building advanced AI models
- · Empirical-only AI development approaches
Improved theoretical guarantees and understanding of neural network training dynamics will emerge.
New AI architectures and training algorithms could be developed based on these theoretical insights.
The development of more explainable, robust, and generalizable AI systems could accelerate, leading to broader AI adoption and impact.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG