SA-GD: Improved Gradient Descent Learning Strategy with Simulated Annealing

by   Zhicheng Cai, et al.

Gradient descent algorithm is the most utilized method when optimizing machine learning issues. However, there exists many local minimums and saddle points in the loss function, especially for high dimensional non-convex optimization problems like deep learning. Gradient descent may make loss function trapped in these local intervals which impedes further optimization, resulting in poor generalization ability. This paper proposes the SA-GD algorithm which introduces the thought of simulated annealing algorithm to gradient descent. SA-GD method offers model the ability of mounting hills in probability, tending to enable the model to jump out of these local areas and converge to a optimal state finally. We took CNN models as an example and tested the basic CNN models on various benchmark datasets. Compared to the baseline models with traditional gradient descent algorithm, models with SA-GD algorithm possess better generalization ability without sacrificing the efficiency and stability of model convergence. In addition, SA-GD can be utilized as an effective ensemble learning approach which improves the final performance significantly.


page 1

page 2

page 3

page 4


Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

Gradient descent and its variants are widely used in machine learning. H...

Update in Unit Gradient

In Machine Learning, optimization mostly has been done by using a gradie...

Simulated Annealing in Early Layers Leads to Better Generalization

Recently, a number of iterative learning methods have been introduced to...

Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation in deep learning

In this paper we model the loss function of high-dimensional optimizatio...

Derivative-Free Global Optimization Algorithms: Bayesian Method and Lipschitzian Approaches

In this paper, we will provide an introduction to the derivative-free op...

A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Adaptive gradient methods have become popular in optimizing deep neural ...

Please sign up or login with your details

Forgot password? Click here to reset