MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

02/22/2021
by   Wancong Zhang, et al.
0

MixUp is a computer vision data augmentation technique that uses convex interpolations of input data and their labels to enhance model generalization during training. However, the application of MixUp to the natural language understanding (NLU) domain has been limited, due to the difficulty of interpolating text directly in the input space. In this study, we propose MixUp methods at the Input, Manifold, and sentence embedding levels for the transformer architecture, and apply them to finetune the BERT model for a diverse set of NLU tasks. We find that MixUp can improve model performance, as well as reduce test loss and model calibration error by up to 50

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset