Stochastic gradient-free descents
In this paper we propose stochastic gradient-free methods and gradient-free methods with momentum for solving stochastic optimization problems. Our fundamental idea is not to directly evaluate and apply gradients but to indirectly learn information about gradients through stochastic directions and corresponding output feedbacks of the objective function, so that it might be possible to further broaden the scope of applications. Without using gradients, these methods still maintain the sublinear convergence rate O(1/k) with a decaying stepsize α_k= O(1/k) for the strongly convex objectives with Lipschitz gradients, and converge to a solution with a zero expected gradient norm when the objective function is nonconvex, twice differentiable and bounded below. In addition, we provide a theoretical analysis about the inclusion of momentum in stochastic settings, which shows that the momentum term introduces extra biases but reduces variances for stochastic directions.
READ FULL TEXT