Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Due to the rapid development of computing hardware resources and the dramatic growth of data, pre-trained models in speech recognition, such as Whisper, have significantly improved the performance of speech recognition tasks. However, these models usually have a high computational overhead, making it difficult to execute effectively on resource-constrained devices. To speed up inference and reduce model size while maintaining performance, we propose a novel guided knowledge distillation and quantization for large pre-trained model Whisper. The student model selects distillation and quantization layers based on quantization loss and distillation loss, respectively. We compressed Whisper_small to Whisper_base and Whisper_tiny levels, making Whisper_small 5.18x/10.48x smaller, respectively. Moreover, compared to the original Whisper_base and Whisper_tiny, there is also a relative character error rate (CER) reduction of 11.3 compressed model respectively.
READ FULL TEXT