arXiv:2605.18528v2 Announce Type: replace-cross Abstract: A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. Scale-invariant methods become important because their normalized layerwise updates can not only support hyperparameter transfer across model sizes but exploit input-output matrix norm geometry. At the same time, stochastic gradient noises in deep learning are often far from sub-Gaussian and may exhibit heavy tails. These crucial observations have shaped recent algorithmic principles for training neural networks, yet
Source: arXiv cs.LG — read the full report at the original publisher.
