Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

07/20/2023
by   Kaiyue Wen, et al.
0

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2018

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

Despite existing work on ensuring generalization of neural networks in t...
research
11/28/2021

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

In this paper, we study the generalization performance of global minima ...
research
07/05/2022

Neural Networks and the Chomsky Hierarchy

Reliable generalization lies at the heart of safe ML and AI. However, un...
research
09/18/2023

Context is Environment

Two lines of work are taking the central stage in AI research. On the on...
research
03/14/2018

Algebraic Machine Learning

Machine learning algorithms use error function minimization to fit a lar...
research
11/10/2022

How Does Sharpness-Aware Minimization Minimize Sharpness?

Sharpness-Aware Minimization (SAM) is a highly effective regularization ...
research
03/18/2022

On the Generalization Mystery in Deep Learning

The generalization mystery in deep learning is the following: Why do ove...

Please sign up or login with your details

Forgot password? Click here to reset