Knowledge Distillation by On-the-Fly Native Ensemble

06/12/2018
by   Xu Lan, et al.
0

Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables favourable knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a highcapacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE) strategy for one-stage online distillation. Specifically, ONE trains only a single multi-branch network while simultaneously establishing a strong teacher on-the- fly to enhance the learning of target network. Extensive evaluations show that ONE improves the generalisation performance a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational efficiency advantages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2020

Peer Collaborative Learning for Online Knowledge Distillation

Traditional knowledge distillation uses a two-stage training strategy to...
research
11/19/2018

Self-Referenced Deep Learning

Knowledge distillation is an effective approach to transferring knowledg...
research
11/11/2022

PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled Distillation

Pre-trained language models have become a crucial part of ranking system...
research
03/09/2020

Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

There is a need for an on-the-fly computational process with very low pe...
research
12/01/2021

Extrapolating from a Single Image to a Thousand Classes using Distillation

What can neural networks learn about the visual world from a single imag...
research
11/01/2022

Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

Methods for improving the efficiency of deep network training (i.e. the ...
research
07/19/2020

Resolution Switchable Networks for Runtime Efficient Image Recognition

We propose a general method to train a single convolutional neural netwo...

Please sign up or login with your details

Forgot password? Click here to reset