Distributed Sparse Linear Regression with Sublinear Communication
We study the problem of high-dimensional sparse linear regression in a distributed setting under both computational and communication constraints. Specifically, we consider a star topology network whereby several machines are connected to a fusion center, with whom they can exchange relatively short messages. Each machine holds noisy samples from a linear regression model with the same unknown sparse d-dimensional vector of regression coefficients θ. The goal of the fusion center is to estimate the vector θ and its support using few computations and limited communication at each machine. In this work, we consider distributed algorithms based on Orthogonal Matching Pursuit (OMP) and theoretically study their ability to exactly recover the support of θ. We prove that under certain conditions, even at low signal-to-noise-ratios where individual machines are unable to detect the support of θ, distributed-OMP methods correctly recover it with total communication sublinear in d. In addition, we present simulations that illustrate the performance of distributed OMP-based algorithms and show that they perform similarly to more sophisticated and computationally intensive methods, and in some cases even outperform them.
READ FULL TEXT