DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

05/28/2023
by   Yifan Peng, et al.
0

Self-supervised learning (SSL) has achieved notable success in many speech processing tasks, but the large model size and heavy computational cost hinder the deployment. Knowledge distillation trains a small student model to mimic the behavior of a large teacher model. However, the student architecture usually needs to be manually designed and will remain fixed during training, which requires prior knowledge and can lead to suboptimal performance. Inspired by recent success of task-specific structured pruning, we propose DPHuBERT, a novel task-agnostic compression method for speech SSL based on joint distillation and pruning. Experiments on SUPERB show that DPHuBERT outperforms pure distillation methods in almost all tasks. Moreover, DPHuBERT requires little training time and performs well with limited training data, making it suitable for resource-constrained applications. Our method can also be applied to various speech SSL models. Our code and models will be publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2022

KD-MVS: Knowledge Distillation Based Self-supervised Learning for MVS

Supervised multi-view stereo (MVS) methods have achieved remarkable prog...
research
07/01/2022

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

Large-scale speech self-supervised learning (SSL) has emerged to the mai...
research
02/28/2023

Generic-to-Specific Distillation of Masked Autoencoders

Large vision Transformers (ViTs) driven by self-supervised pre-training ...
research
11/17/2022

Compressing Transformer-based self-supervised models for speech processing

Despite the success of Transformers in self-supervised learning with app...
research
09/14/2023

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Large-scale self-supervised pre-trained speech encoders outperform conve...
research
09/30/2021

Deep Neural Compression Via Concurrent Pruning and Self-Distillation

Pruning aims to reduce the number of parameters while maintaining perfor...

Please sign up or login with your details

Forgot password? Click here to reset