Respecting Transfer Gap in Knowledge Distillation

10/23/2022
by   Yulei Niu, et al.
0

Knowledge distillation (KD) is essentially a process of transferring a teacher model's behavior, e.g., network response, to a student model. The network response serves as additional supervision to formulate the machine domain, which uses the data collected from the human domain as a transfer set. Traditional KD methods hold an underlying assumption that the data collected in both human domain and machine domain are both independent and identically distributed (IID). We point out that this naive assumption is unrealistic and there is indeed a transfer gap between the two domains. Although the gap offers the student model external knowledge from the machine domain, the imbalanced teacher knowledge would make us incorrectly estimate how much to transfer from teacher to student per sample on the non-IID transfer set. To tackle this challenge, we propose Inverse Probability Weighting Distillation (IPWD) that estimates the propensity score of a training sample belonging to the machine domain, and assigns its inverse amount to compensate for under-represented samples. Experiments on CIFAR-100 and ImageNet demonstrate the effectiveness of IPWD for both two-stage distillation and one-stage self-distillation.

READ FULL TEXT
research
03/23/2021

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge v...
research
09/24/2019

FEED: Feature-level Ensemble for Knowledge Distillation

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-stud...
research
10/18/2022

On effects of Knowledge Distillation on Transfer Learning

Knowledge distillation is a popular machine learning technique that aims...
research
12/19/2021

Controlling the Quality of Distillation in Response-Based Network Compression

The performance of a distillation-based compressed network is governed b...
research
01/21/2021

Distilling Interpretable Models into Human-Readable Code

The goal of model distillation is to faithfully transfer teacher model k...
research
10/27/2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Knowledge distillation (KD) aims to craft a compact student model that i...
research
07/24/2020

Dynamic Knowledge Distillation for Black-box Hypothesis Transfer Learning

In real world applications like healthcare, it is usually difficult to b...

Please sign up or login with your details

Forgot password? Click here to reset