The Penalty Imposed by Ablated Data Augmentation

by   Frederick Liu, et al.

There is a set of data augmentation techniques that ablate parts of the input at random. These include input dropout, cutout, and random erasing. We term these techniques ablated data augmentation. Though these techniques seems similar in spirit and have shown success in improving model performance in a variety of domains, we do not yet have a mathematical understanding of the differences between these techniques like we do for other regularization techniques like L1 or L2. First, we study a formal model of mean ablated data augmentation and inverted dropout for linear regression. We prove that ablated data augmentation is equivalent to optimizing the ordinary least squares objective along with a penalty that we call the Contribution Covariance Penalty and inverted dropout, a more common implementation than dropout in popular frameworks, is equivalent to optimizing the ordinary least squares objective along with Modified L2. For deep networks, we demonstrate an empirical version of the result if we replace contributions with attributions and coefficients with average gradients, i.e., the Contribution Covariance Penalty and Modified L2 Penalty drop with the increase of the corresponding ablated data augmentation across a variety of networks.


page 1

page 2

page 3

page 4


Dropout as data augmentation

Dropout is typically interpreted as bagging a large number of models sha...

Further advantages of data augmentation on convolutional neural networks

Data augmentation is a popular technique largely used to enhance the tra...

Improved Regularization Techniques for End-to-End Speech Recognition

Regularization is important for end-to-end speech models, since the mode...

Smart Augmentation - Learning an Optimal Data Augmentation Strategy

A recurring problem faced when training neural networks is that there is...

Data Augmentation for Drum Transcription with Convolutional Neural Networks

A recurrent issue in deep learning is the scarcity of data, in particula...

Application of multilayer perceptron with data augmentation in nuclear physics

Neural networks have become popular in many fields of science since they...

Population Based Training for Data Augmentation and Regularization in Speech Recognition

Varying data augmentation policies and regularization over the course of...

Please sign up or login with your details

Forgot password? Click here to reset