Learning ReLU Networks via Alternating Minimization
We propose and analyze a new family of algorithms for training neural networks with ReLU activations. Our algorithms are based on the technique of alternating minimization: estimating the activation patterns of each ReLU for all given samples, interleaved with weight updates via a least-squares step. We consider three different cases of this model: (i) a single ReLU; (ii) 1-hidden layer networks with k hidden ReLUs (iii) 2-hidden layer networks. We show that under standard distributional assumptions on the input data, our algorithm provably recovers the true "ground truth" parameters in a linearly convergent fashion; furthermore, our method exhibits requires only O(d) samples for the single ReLU case and O(dk^2) samples in the 1-hidden layer case. We also extend this framework to deeper networks, and empirically demonstrate its convergence to a global minimum.
READ FULL TEXT