Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

08/20/2018
by   Zaiyi Chen, et al.
0

Although stochastic gradient descent () method and its variants (e.g., stochastic momentum methods, ) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for that uses stagewise step size and returns an averaged solution. In addition, theoretical insights of why adaptive step size of could improve non-adaptive step size of is still missing for non-convex optimization. This paper aims to address these questions and fill the gap between theory and practice. We propose a universal stagewise optimization framework for a broad family of non-smooth non-convex problems with the following key features: (i) each stage calls a basic algorithm (e.g., or ) for a regularized convex problem that returns an averaged solution; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution that is selected from all stagewise averaged solutions with sampling probabilities increasing as the stage number. Our theoretical results of stagewise exhibit its adaptive convergence, therefore shed insights on its faster convergence for problems with sparse stochastic gradients than stagewise . To the best of our knowledge, these new results are the first of their kind for addressing the unresolved issues of existing theories mentioned earlier.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset