Feature Incay for Representation Regularization

05/29/2017
by   Yuhui Yuan, et al.
0

Softmax loss is widely used in deep neural networks for multi-class classification, where each class is represented by a weight vector, a sample is represented as a feature vector, and the feature vector has the largest projection on the weight vector of the correct category when the model correctly classifies a sample. To ensure generalization, weight decay that shrinks the weight norm is often used as regularizer. Different from traditional learning algorithms where features are fixed and only weights are tunable, features are also tunable as representation learning in deep learning. Thus, we propose feature incay to also regularize representation learning, which favors feature vectors with large norm when the samples can be correctly classified. With the feature incay, feature vectors are further pushed away from the origin along the direction of their corresponding weight vectors, which achieves better inter-class separability. In addition, the proposed feature incay encourages intra-class compactness along the directions of weight vectors by increasing the small feature norm faster than the large ones. Empirical results on MNIST, CIFAR10 and CIFAR100 demonstrate feature incay can improve the generalization ability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2019

Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory

We empirically evaluate common assumptions about neural networks that ar...
research
11/25/2017

Learning Less-Overlapping Representations

In representation learning (RL), how to make the learned representations...
research
10/09/2021

Pairwise Margin Maximization for Deep Neural Networks

The weight decay regularization term is widely used during training to c...
research
08/03/2023

Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation

We generalize the class vectors found in neural networks to linear subsp...
research
07/06/2022

Difference in Euclidean Norm Can Cause Semantic Divergence in Batch Normalization

In this paper, we show that the difference in Euclidean norm of samples ...
research
10/03/2022

Omnigrok: Grokking Beyond Algorithmic Data

Grokking, the unusual phenomenon for algorithmic datasets where generali...
research
02/18/2013

Feature Multi-Selection among Subjective Features

When dealing with subjective, noisy, or otherwise nebulous features, the...

Please sign up or login with your details

Forgot password? Click here to reset