Towards Asymptotic Optimality with Conditioned Stochastic Gradient Descent
In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called conditioned SGD, based on a preconditioning of the gradient direction. Under some mild assumptions, namely the L-smoothness of the objective function and some weak growth condition on the noise, we establish the almost sure convergence and the asymptotic normality for a broad class of conditioning matrices. In particular, when the conditioning matrix is an estimate of the inverse Hessian at the optimal point, the algorithm is proved to be asymptotically optimal. The benefits of this approach are validated on simulated and real datasets.
READ FULL TEXT