
arXiv:2605.24316v1 Announce Type: new Abstract: Scaling laws provide compact descriptions of how prediction error varies with compute, model size, and data, but existing theory mainly treats single-sample SGD or full data reuse, leaving the role of mini-batching unclear. We study batch scaling laws for sketched linear regression under a power-law covariance spectrum and a source condition on the target parameter. We analyze one-pass batch SGD, multi-pass batch SGD with replacement, and multi-pass batch SGD without replacement. Our first result is a risk decomposition: all three procedures shar
The paper addresses a gap in understanding mini-batching's role in scaling laws, which is critical as AI models become larger and more complex, demanding efficient data handling.
Improved theoretical understanding of batch scaling laws can lead to more efficient and reliable training of large AI models, reducing computational waste and accelerating model development.
This research provides a clearer framework for optimizing batch SGD in sketched linear regression, potentially guiding the design of more performant and resource-efficient learning algorithms.
- · AI researchers
- · Cloud computing providers
- · Big data analytics firms
- · Inefficient AI training methods
- · Organizations with limited compute resources
More precise selection of mini-batch sizes for AI model training based on theoretical guarantees.
Reduced training times and computational costs for large-scale machine learning applications.
Acceleration of research into and deployment of foundation models due to improved training efficiency.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG