Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

arXiv:2602.03001v2 Announce Type: replace Abstract: To maximize hardware utilization, modern machine learning systems typically employ large constant or manually tuned batch size schedules, relying on heuristics that are brittle and costly to tune. Existing adaptive strategies based on gradient noise scale (GNS) offer a principled alternative. However, their assumption of SGD's Euclidean geometry creates a fundamental mismatch with popular optimizers based on generalized norms, such as signSGD / Signum ($\ell_\infty$) and stochastic spectral descent (specSGD) / Muon ($\mathcal{S}_\infty$). In
The paper directly addresses known limitations in current adaptive batch sizing techniques, specifically their incompatibility with non-Euclidean optimizers widely used in machine learning, offering a principled solution.
This research provides a more robust and efficient way to optimize machine learning models by adapting batch sizes dynamically, which can lead to faster training times and more consistent performance regardless of the optimizer chosen.
Machine learning systems can potentially achieve higher hardware utilization and more stable training convergence by employing adaptive batch sizing methods that are compatible with advanced optimizers like signSGD and specSGD.
- · Machine Learning Researchers
- · AI compute infrastructure providers
- · Deep Learning framework developers
- · Manual hyperparameter tuning practitioners
- · Inefficient ML training pipelines
More efficient and faster training of large-scale AI models.
Reduced computational costs for developing and deploying new AI applications, particularly those reliant on non-Euclidean optimizers.
Acceleration of research and development in AI, making more complex models feasible to train on existing hardware at scale.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG