Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

by   Ziquan Liu, et al.

Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. Residual connections are also considered and we show that our regularizer also forms an upper bound to input gradients of such a residual network. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness.


page 1

page 2

page 3

page 4


Why ResNet Works? Residuals Generalize

Residual connections significantly boost the performance of deep neural ...

A Novel, Scale-Invariant, Differentiable, Efficient, Scalable Regularizer

L_p-norm regularization schemes such as L_0, L_1, and L_2-norm regulariz...

SVMax: A Feature Embedding Regularizer

A neural network regularizer (e.g., weight decay) boosts performance by ...

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

In this note, we study how neural networks with a single hidden layer an...

A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets

Weight decay is one of the most widely used forms of regularization in d...

Fine-grained Optimization of Deep Neural Networks

In recent studies, several asymptotic upper bounds on generalization err...

DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

In seeking for sparse and efficient neural network models, many previous...

Please sign up or login with your details

Forgot password? Click here to reset