Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

arXiv:2512.02342v3 Announce Type: replace-cross Abstract: The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for
This research is part of ongoing efforts to improve optimization algorithms, a fundamental component of machine learning, appearing as a new iteration of a previously published paper.
Improved optimization techniques can lead to more efficient and robust machine learning models, but this specific advancement is incremental within the academic research cycle.
The proposed SPS$_{safe}$ offers a potentially more robust optimization approach for non-smooth problems compared to previous stochastic Polyak step size variants.
- · AI researchers
- · Machine learning practitioners
Slight improvements in the training efficiency and stability of certain machine learning models will occur.
Broader adoption of robust optimization methods could reduce edge-case failures in AI-driven systems where non-smooth properties are prevalent.
Advances in foundational optimization could eventually contribute to more computationally efficient AI, indirectly easing energy or compute bottlenecks over a very long timeframe.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG