Online Distillation with Mixed Sample Augmentation

06/24/2022
by   Yiqing Shen, et al.
0

Mixed Sample Regularization (MSR), such as MixUp or CutMix, is a powerful data augmentation strategy to generalize convolutional neural networks. Previous empirical analysis has illustrated an orthogonal performance gain between MSR and the conventional offline Knowledge Distillation (KD). To be more specific, student networks can be enhanced with the involvement of MSR in the training stage of the sequential distillation. Yet, the interplay between MSR and online knowledge distillation, a stronger distillation paradigm, where an ensemble of peer students learn mutually from each other, remains unexplored. To bridge the gap, we make the first attempt at incorporating CutMix into online distillation, where we empirically observe a significant improvement. Encouraged by this fact, we propose an even stronger MSR specifically for online distillation, named as Cut^nMix. Furthermore, a novel online distillation framework is designed upon Cut^nMix, to enhance the distillation with feature level mutual learning and a self-ensemble teacher. Comprehensive evaluations on CIFAR10 and CIFAR100 with six network architectures show that our approach can consistently outperform state-of-the-art distillation methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2020

Peer Collaborative Learning for Online Knowledge Distillation

Traditional knowledge distillation uses a two-stage training strategy to...
research
12/05/2020

Knowledge Distillation Thrives on Data Augmentation

Knowledge distillation (KD) is a general deep neural network training fr...
research
04/01/2021

Students are the Best Teacher: Exit-Ensemble Distillation with Multi-Exits

This paper proposes a novel knowledge distillation-based learning method...
research
06/06/2020

An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation

Generalization Performance of Deep Learning models trained using the Emp...
research
09/09/2020

On the Orthogonality of Knowledge Distillation with Other Techniques: From an Ensemble Perspective

To put a state-of-the-art neural network to practical use, it is necessa...
research
03/10/2020

SuperMix: Supervising the Mixing Data Augmentation

In this paper, we propose a supervised mixing augmentation method, terme...
research
02/05/2020

Feature-map-level Online Adversarial Knowledge Distillation

Feature maps contain rich information about image intensity and spatial ...

Please sign up or login with your details

Forgot password? Click here to reset