On Tight Convergence Rates of Without-replacement SGD
For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD. Denoting by n the number of components in the cost and K the number of epochs of the algorithm , several recent works have shown convergence rates of without-replacement SGD that have better dependency on n and K than the baseline rate of O(1/(nK)) for SGD. However, there are two main limitations shared among those works: the rates have extra poly-logarithmic factors on nK, and denoting by κ the condition number of the problem, the rates hold after κ^clog(nK) epochs for some c>0. In this work, we overcome these limitations by analyzing step sizes that vary across epochs.
READ FULL TEXT