Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning

by   Andrei Kucharavy, et al.
HES-SO Valais-Wallis

Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Artificial Neural Networks (ANNs), leading to better machine learning algorithms. However, SGD is not applicable in a non-differentiable setting, leaving all that prior research off the table. In this paper, we show that a class of evolutionary algorithms (EAs) inspired by the Gillespie-Orr Mutational Landscapes model for natural evolution is formally equivalent to SGD in certain settings and, in practice, is well adapted to large ANNs. We refer to such EAs as Gillespie-Orr EA class (GO-EAs) and empirically show how an insight transfer from SGD can work for them. We then show that for ANNs trained to near-optimality or in the transfer learning setting, the equivalence also allows transferring the insights from the Mutational Landscapes model to SGD. We then leverage this equivalence to experimentally show how SGD and GO-EAs can provide mutual insight through examples of minima flatness, transfer learning, and mixing of individuals in EAs applied to large models.


Dynamic of Stochastic Gradient Descent with State-Dependent Noise

Stochastic gradient descent (SGD) and its variants are mainstream method...

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

We propose a population-based Evolutionary Stochastic Gradient Descent (...

An Alternative View: When Does SGD Escape Local Minima?

Stochastic gradient descent (SGD) is widely used in machine learning. Al...

Evolving Differentiable Gene Regulatory Networks

Over the past twenty years, artificial Gene Regulatory Networks (GRNs) h...

A Scale Invariant Flatness Measure for Deep Network Minima

It has been empirically observed that the flatness of minima obtained fr...

On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent

Because stochastic gradient descent (SGD) has shown promise optimizing n...

Quasi-potential theory for escape problem: Quantitative sharpness effect on SGD's escape from local minima

We develop a quantitative theory on an escape problem of a stochastic gr...

Please sign up or login with your details

Forgot password? Click here to reset