On the Impact of Knowledge Distillation for Model Interpretability

05/25/2023
by   Hyeongrok Han, et al.
0

Several recent studies have elucidated why knowledge distillation (KD) improves model performance. However, few have researched the other advantages of KD in addition to its improving model performance. In this study, we have attempted to show that KD enhances the interpretability as well as the accuracy of models. We measured the number of concept detectors identified in network dissection for a quantitative comparison of model interpretability. We attributed the improvement in interpretability to the class-similarity information transferred from the teacher to student models. First, we confirmed the transfer of class-similarity information from the teacher to student model via logit distillation. Then, we analyzed how class-similarity information affects model interpretability in terms of its presence or absence and degree of similarity information. We conducted various quantitative and qualitative experiments and examined the results on different datasets, different KD methods, and according to different measures of interpretability. Our research showed that KD models by large models could be used more reliably in various fields.

READ FULL TEXT

page 6

page 19

page 20

research
12/19/2021

Controlling the Quality of Distillation in Response-Based Network Compression

The performance of a distillation-based compressed network is governed b...
research
09/30/2020

Pea-KD: Parameter-efficient and Accurate Knowledge Distillation

How can we efficiently compress a model while maintaining its performanc...
research
10/02/2020

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Knowledge distillation is an effective method to transfer the knowledge ...
research
04/25/2022

Proto2Proto: Can you recognize the car, the way I do?

Prototypical methods have recently gained a lot of attention due to thei...
research
12/16/2022

Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework

Knowledge distillation (KD) has been widely used for model compression a...
research
02/16/2023

Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK

High-order Takagi-Sugeno-Kang (TSK) fuzzy classifiers possess powerful c...
research
10/20/2022

Similarity of Neural Architectures Based on Input Gradient Transferability

In this paper, we aim to design a quantitative similarity function betwe...

Please sign up or login with your details

Forgot password? Click here to reset