An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

by   Nhat Ho, et al.

Using gradient descent (GD) with fixed or decaying step-size is standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed exponential step size gradient descent (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under non-regular statistical models whose the loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a polynomial number of iterations of the GD algorithm. Therefore, the total computational complexity of the EGD algorithm is optimal and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models. To the best of our knowledge, it resolves a long-standing gap between statistical and algorithmic computational complexities of parameter estimation in non-regular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.


page 1

page 2

page 3

page 4


Improving Computational Complexity in Statistical Models with Second-Order Information

It is known that when the statistical models are singular, i.e., the Fis...

Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures

The Expectation-Maximization (EM) algorithm has been predominantly used ...

Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

We study the statistical and computational complexities of the Polyak st...

The Statistical Complexity of Early Stopped Mirror Descent

Recently there has been a surge of interest in understanding implicit re...

Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points

This work shows that applying Gradient Descent (GD) with a fixed step si...

Instability, Computational Efficiency and Statistical Accuracy

Many statistical estimators are defined as the fixed point of a data-dep...

Decentralised Learning with Random Features and Distributed Gradient Descent

We investigate the generalisation performance of Distributed Gradient De...

Please sign up or login with your details

Forgot password? Click here to reset