Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks

02/20/2017
by   Chunjie Luo, et al.
0

Traditionally, multi-layer neural networks use dot product between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded, thus increases the risk of large variance. Large variance of neuron makes the model sensitive to the change of input distribution, thus results in poor generalization, and aggravates the internal covariate shift which slows down the training. To bound dot product and decrease the variance, we propose to use cosine similarity or centered cosine similarity (Pearson Correlation Coefficient) instead of dot product in neural networks, which we call cosine normalization. We compare cosine normalization with batch, weight and layer normalization in fully-connected neural networks as well as convolutional networks on the data sets of MNIST, 20NEWS GROUP, CIFAR-10/100 and SVHN. Experiments show that cosine normalization achieves better performance than other normalization techniques.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 7

page 8

page 9

research
09/04/2018

Understanding Regularization in Batch Normalization

Batch Normalization (BN) makes output of hidden neuron had zero mean and...
research
07/21/2016

Layer Normalization

Training state-of-the-art, deep neural networks is computationally expen...
research
04/30/2019

PR Product: A Substitute for Inner Product in Neural Networks

In this paper, we analyze the inner product of weight vector and input v...
research
10/12/2020

How does Weight Correlation Affect the Generalisation Ability of Deep Neural Networks

This paper studies the novel concept of weight correlation in deep neura...
research
12/07/2017

Solving internal covariate shift in deep learning with linked neurons

This work proposes a novel solution to the problem of internal covariate...
research
10/27/2019

Inherent Weight Normalization in Stochastic Neural Networks

Multiplicative stochasticity such as Dropout improves the robustness and...
research
08/24/2021

Improving Generalization of Batch Whitening by Convolutional Unit Optimization

Batch Whitening is a technique that accelerates and stabilizes training ...

Please sign up or login with your details

Forgot password? Click here to reset