Training Better CNNs Requires to Rethink ReLU

by   Gangming Zhao, et al.

With the rapid development of Deep Convolutional Neural Networks (DCNNs), numerous works focus on designing better network architectures (i.e., AlexNet, VGG, Inception, ResNet and DenseNet etc.). Nevertheless, all these networks have the same characteristic: each convolutional layer is followed by an activation layer, a Rectified Linear Unit (ReLU) layer is the most used among them. In this work, we argue that the paired module with 1:1 convolution and ReLU ratio is not the best choice since it may result in poor generalization ability. Thus, we try to investigate the more suitable convolution and ReLU ratio for exploring the better network architectures. Specifically, inspired by Leaky ReLU, we focus on adopting the proportional module with N:M (N>M) convolution and ReLU ratio to design the better networks. From the perspective of ensemble learning, Leaky ReLU can be considered as an ensemble of networks with different convolution and ReLU ratio. We find that the proportional module with N:M (N>M) convolution and ReLU ratio can help networks acquire the better performance, through the analysis of a simple Leaky ReLU model. By utilizing the proportional module with N:M (N>M) convolution and ReLU ratio, many popular networks can form more rich representations in models, since the N:M (N>M) proportional module can utilize information more effectively. Furthermore, we apply this module in diverse DCNN models to explore whether is the N:M (N>M) convolution and ReLU ratio indeed more effective. From our experimental results, we can find that such a simple yet effective method achieves better performance in different benchmarks with various network architectures and the experimental results verify that the superiority of the proportional module.


page 1

page 2

page 3

page 4


Activation Functions: Do They Represent A Trade-Off Between Modular Nature of Neural Networks And Task Performance

Current research suggests that the key factors in designing neural netwo...

EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks

For most state-of-the-art architectures, Rectified Linear Unit (ReLU) be...

Why ReLU Units Sometimes Die: Analysis of Single-Unit Error Backpropagation in Neural Networks

Recently, neural networks in machine learning use rectified linear units...

Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning

We examine the characteristic activation values of individual ReLU units...

Mathematical Analysis of Adversarial Attacks

In this paper, we analyze efficacy of the fast gradient sign method (FGS...

Multi-Bias Non-linear Activation in Deep Neural Networks

As a widely used non-linear activation, Rectified Linear Unit (ReLU) sep...

Comparisons among different stochastic selection of activation layers for convolutional neural networks for healthcare

Classification of biological images is an important task with crucial ap...

Please sign up or login with your details

Forgot password? Click here to reset