Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule

09/14/2023
by   Jason M. Altschuler, et al.
0

Can we accelerate convergence of gradient descent without changing the algorithm – just by carefully choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in k^log_ρ 2≈ k^0.7864 iterations, where ρ=1+√(2) is the silver ratio and k is the condition number. This is intermediate between the textbook unaccelerated rate k and the accelerated rate √(k) due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate ε^-log_ρ 2≈ε^-0.7864. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period k^log_ρ 2. This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro