arXiv:2602.02431v2 Announce Type: replace-cross Abstract: It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a loss modification mechanism achieved by the first two passes on the data. In this work, we consider lea
Source: arXiv cs.LG — read the full report at the original publisher.
