Efficient Sub-structured Knowledge Distillation

03/09/2022
by   Wenye Lin, et al.
0

Structured prediction models aim at solving a type of problem where the output is a complex structure, rather than a single variable. Performing knowledge distillation for such models is not trivial due to their exponentially large output space. In this work, we propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches. Specifically, we transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space. In this manner, we avoid adopting some time-consuming techniques like dynamic programming (DP) for decoding output structures, which permits parallel computation and makes the training process even faster in practice. Besides, it encourages the student model to better mimic the internal behavior of the teacher model. Experiments on two structured prediction tasks demonstrate that our approach outperforms previous methods and halves the time cost for one training epoch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2020

Structural Knowledge Distillation

Knowledge distillation is a critical technique to transfer knowledge bet...
research
10/08/2019

Knowledge Distillation from Internal Representations

Knowledge distillation is typically conducted by training a small model ...
research
05/19/2020

Learning from a Lightweight Teacher for Efficient Knowledge Distillation

Knowledge Distillation (KD) is an effective framework for compressing de...
research
05/18/2023

Student-friendly Knowledge Distillation

In knowledge distillation, the knowledge from the teacher model is often...
research
11/18/2019

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Knowledge distillation (KD) is widely used for training a compact model ...
research
06/29/2023

Understanding the Overfitting of the Episodic Meta-training

Despite the success of two-stage few-shot classification methods, in the...
research
11/01/2018

Deep Structured Prediction with Nonlinear Output Transformations

Deep structured models are widely used for tasks like semantic segmentat...

Please sign up or login with your details

Forgot password? Click here to reset