The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

by   Chao Wang, et al.

In the context of label-efficient learning on video data, the distillation method and the structural design of the teacher-student architecture have a significant impact on knowledge distillation. However, the relationship between these factors has been overlooked in previous research. To address this gap, we propose a new weakly supervised learning framework for knowledge distillation in video classification that is designed to improve the efficiency and accuracy of the student model. Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages. We also employ the progressive cascade training method to address the accuracy loss caused by the large capacity gap between the teacher and the student. Additionally, we propose a pseudo-label optimization strategy to improve the initial data label. To optimize the loss functions of different distillation substages during the training process, we introduce a new loss method based on feature distribution. We conduct extensive experiments on both real and simulated data sets, demonstrating that our proposed approach outperforms existing distillation methods in terms of knowledge distillation for video classification tasks. Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.


page 1

page 4


Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

Knowledge distillation has been used to transfer knowledge learned by a ...

Improved Knowledge Distillation via Adversarial Collaboration

Knowledge distillation has become an important approach to obtain a comp...

Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation

It has been commonly observed that a teacher model with superior perform...

Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification

Computer-aided pathology diagnosis based on the classification of Whole ...

Learning Deep Nets for Gravitational Dynamics with Unknown Disturbance through Physical Knowledge Distillation: Initial Feasibility Study

Learning high-performance deep neural networks for dynamic modeling of h...

Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval

Previous Knowledge Distillation based efficient image retrieval methods ...

Weakly Supervised Semantic Segmentation via Alternative Self-Dual Teaching

Current weakly supervised semantic segmentation (WSSS) frameworks usuall...

Please sign up or login with your details

Forgot password? Click here to reset