Learning ReLU Networks via Alternating Minimization

06/20/2018
by   Gauri Jagatap, et al.
0

We propose and analyze a new family of algorithms for training neural networks with ReLU activations. Our algorithms are based on the technique of alternating minimization: estimating the activation patterns of each ReLU for all given samples, interleaved with weight updates via a least-squares step. We consider three different cases of this model: (i) a single ReLU; (ii) 1-hidden layer networks with k hidden ReLUs (iii) 2-hidden layer networks. We show that under standard distributional assumptions on the input data, our algorithm provably recovers the true "ground truth" parameters in a linearly convergent fashion; furthermore, our method exhibits requires only O(d) samples for the single ReLU case and O(dk^2) samples in the 1-hidden layer case. We also extend this framework to deeper networks, and empirically demonstrate its convergence to a global minimum.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset