Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

arXiv:2604.09412v2 Announce Type: replace-cross Abstract: We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierar
The paper provides a sharper theoretical understanding of neural network optimization, reflecting ongoing research into the fundamental properties of deep learning models.
A clearer understanding of local minima in ReLU networks can lead to more robust and efficient training algorithms, impacting the development and deployment of AI systems.
This theoretical insight could inform the design of future optimization techniques, potentially reducing training instability and improving model performance.
- · AI researchers
- · Machine learning engineers
- · Deep learning framework developers
Improved theoretical understanding of deep neural network training dynamics.
Development of more stable and efficient algorithms for training large-scale AI models.
Accelerated progress in areas reliant on deep learning, potentially opening new application frontiers.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG