On the Outsized Importance of Learning Rates in Local Update Methods

07/02/2020
by   Zachary Charles, et al.
0

We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms. We prove that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which we exactly characterize. We show that the choice of client learning rate controls the condition number of that surrogate loss, as well as the distance between the minimizers of the surrogate and true loss functions. We use this theory to derive novel convergence rates for federated averaging that showcase this trade-off between the condition number of the surrogate loss and its alignment with the true loss function. We validate our results empirically, showing that in communication-limited settings, proper learning rate tuning is often sufficient to reach near-optimal behavior. We also present a practical method for automatic learning rate decay in local update methods that helps reduce the need for learning rate tuning, and highlight its empirical performance on a variety of tasks and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2021

Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning

We study a family of algorithms, which we refer to as local update metho...
research
03/07/2018

WNGrad: Learn the Learning Rate in Gradient Descent

Adjusting the learning rate schedule in stochastic gradient methods is a...
research
02/28/2022

Amortized Proximal Optimization

We propose a framework for online meta-optimization of parameters that g...
research
09/08/2021

Iterated Vector Fields and Conservatism, with Applications to Federated Learning

We study iterated vector fields and investigate whether they are conserv...
research
12/09/2021

Extending AdamW by Leveraging Its Second Moment and Magnitude

Recent work [4] analyses the local convergence of Adam in a neighbourhoo...
research
04/07/2020

Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by Local Quadratic Approximation

In deep learning tasks, the learning rate determines the update step siz...
research
04/24/2019

Towards Combining On-Off-Policy Methods for Real-World Applications

In this paper, we point out a fundamental property of the objective in r...

Please sign up or login with your details

Forgot password? Click here to reset