Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning

arXiv:2602.02431v2 Announce Type: replace-cross Abstract: It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a loss modification mechanism achieved by the first two passes on the data. In this work, we consider lea
The paper addresses a long-standing folklore in machine learning regarding the statistical efficiency of different gradient descent methods, offering new theoretical insights in non-linear and non-convex settings.
This research refines our understanding of fundamental AI optimization techniques, potentially influencing how future AI models are trained and optimized for efficiency and performance.
The findings challenge previous assumptions that single-pass SGD is always superior in certain contexts, suggesting that full-batch gradient descent can outperform, especially with multi-pass data use.
- · AI researchers
- · Machine learning engineers
- · Companies with large datasets
- · Developers solely relying on one-pass SGD for all applications
Refined theoretical understanding of gradient descent algorithms for AI training.
Improved efficiency and performance in training large and complex AI models through better algorithm selection.
Acceleration of research into novel optimization techniques that leverage multi-pass data access more effectively.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG