Accurate and Structured Pruning for Efficient Automatic Speech Recognition

05/31/2023
by   Huiqiang Jiang, et al.
0

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy on resource-limited devices. In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the model size and inference cost of the Conformer model while preserving high recognition performance. Our approach utilizes a set of binary masks to indicate whether to retain or prune each Conformer module, and employs L0 regularization to learn the optimal mask values. To further enhance pruning performance, we use a layerwise distillation strategy to transfer knowledge from unpruned to pruned models. Our method outperforms all pruning baselines on the widely used LibriSpeech benchmark, achieving a 50 in model size and a 28 loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2023

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

Transformer-based models have recently made significant achievements in ...
research
10/27/2022

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Recent years have witnessed great strides in self-supervised learning (S...
research
05/16/2020

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

In automatic speech recognition (ASR), model pruning is a widely adopted...
research
02/09/2021

Sparsification via Compressed Sensing for Automatic Speech Recognition

In order to achieve high accuracy for machine learning (ML) applications...
research
05/07/2022

Automatic Block-wise Pruning with Auxiliary Gating Structures for Deep Convolutional Neural Networks

Convolutional neural networks are prevailing in deep learning tasks. How...
research
12/05/2020

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Deep neural networks (DNNs) have been extremely successful in solving ma...
research
09/05/2023

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Automatic Speech Recognition (ASR) models need to be optimized for speci...

Please sign up or login with your details

Forgot password? Click here to reset