With the success of large-scale pre-trained models (PTMs), how efficient...
In this paper, we take the advantage of previous pre-trained models (PTM...
Knowledge distillation (KD) has gained much attention due to its
effecti...
In this paper we propose a new intermediate supervision method, named
La...