Practical Newton-Type Distributed Learning using Gradient Based Approximations

07/22/2019
by   Samira Sheikhi, et al.
0

We study distributed algorithms for expected loss minimization where the datasets are large and have to be stored on different machines. Often we deal with minimizing the average of a set of convex functions where each function is the empirical risk of the corresponding part of the data. In the distributed setting where the individual data instances can be accessed only on the local machines, there would be a series of rounds of local computations followed by some communication among the machines. Since the cost of the communication is usually higher than the local machine computations, it is important to reduce it as much as possible. However, we should not allow this to make the computation too expensive to become a burden in practice. Using second-order methods could make the algorithms converge faster and decrease the amount of communication needed. There are some successful attempts in developing distributed second-order methods. Although these methods have shown fast convergence, their local computation is expensive and could enjoy more improvement for practical uses. In this study we modify an existing approach, DANE (Distributed Approximate NEwton), in order to improve the computational cost while maintaining the accuracy. We tackle this problem by using iterative methods for solving the local subproblems approximately instead of providing exact solutions for each round of communication. We study how using different iterative methods affect the behavior of the algorithm and try to provide an appropriate tradeoff between the amount of local computation and the required amount of communication. We demonstrate the practicality of our algorithm and compare it to the existing distributed gradient based methods such as SGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2015

Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss

We consider distributed convex optimization problems originated from sam...
research
06/05/2015

Communication Complexity of Distributed Convex Learning and Optimization

We study the fundamental limits to communication-efficient distributed m...
research
03/04/2018

A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization

In this paper, we propose a communication- and computation- efficient di...
research
06/15/2020

Distributed Newton Can Communicate Less and Resist Byzantine Workers

We develop a distributed second order optimization algorithm that is com...
research
06/27/2012

Distributed Parameter Estimation via Pseudo-likelihood

Estimating statistical models within sensor networks requires distribute...
research
09/11/2017

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

For distributed computing environments, we consider the canonical machin...
research
02/01/2018

Distributed Newton Methods for Deep Neural Networks

Deep learning involves a difficult non-convex optimization problem with ...

Please sign up or login with your details

Forgot password? Click here to reset