Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

04/01/2021
by   Zhiqiang Shen, et al.
0

This work aims to empirically clarify a recently discovered perspective that label smoothing is incompatible with knowledge distillation. We begin by introducing the motivation behind on how this incompatibility is raised, i.e., label smoothing erases relative information between teacher logits. We provide a novel connection on how label smoothing affects distributions of semantically similar and dissimilar classes. Then we propose a metric to quantitatively measure the degree of erased information in sample's representation. After that, we study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments on Image Classification, Binary Networks, and Neural Machine Translation. Finally, we broadly discuss several circumstances wherein label smoothing will indeed lose its effectiveness. Project page: http://zhiqiangshen.com/projects/LS_and_KD/index.html.

READ FULL TEXT

page 5

page 8

research
06/06/2019

When Does Label Smoothing Help?

The generalization and learning speed of a multi-class neural network ca...
research
09/25/2019

Revisit Knowledge Distillation: a Teacher-free Framework

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersom...
research
01/30/2023

Knowledge Distillation ≈ Label Smoothing: Fact or Fallacy?

Contrary to its original interpretation as a facilitator of knowledge tr...
research
06/29/2022

Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?

This work investigates the compatibility between label smoothing (LS) an...
research
12/02/2021

A Fast Knowledge Distillation Framework for Visual Recognition

While Knowledge Distillation (KD) has been recognized as a useful tool i...
research
04/16/2020

Knowledge Distillation for Action Anticipation via Label Smoothing

Human capability to anticipate near future from visual observations and ...
research
03/15/2023

Knowledge Distillation from Single to Multi Labels: an Empirical Study

Knowledge distillation (KD) has been extensively studied in single-label...

Please sign up or login with your details

Forgot password? Click here to reset