CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models

12/01/2022
by   Zih-Ching Chen, et al.
0

Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. Transformer based models such as HuBERT, which consist a feature extractor and transformer layers, are leading the field in the speech domain. SSL models are fine-tuned on a wide range of downstream tasks, which involves re-training the majority of the model for each task. Previous studies have introduced applying adapters, which are small lightweight modules commonly used in Natural Language Processing (NLP) to adapt pre-trained models to new tasks. However, such efficient tuning techniques only provide adaptation at the transformer layer, but failed to perform adaptation at the feature extractor. In this paper, we propose CHAPTER, an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor. Using this method, we can only fine-tune fewer than 5 of parameters per task compared to fully fine-tuning and achieve better and more stable performance. We empirically found that adding CNN adapters to the feature extractor can help the adaptation on emotion and speaker tasks. For instance, the accuracy of SID is improved from 87.71 to 91.56, and the accuracy of ER is improved by 5

READ FULL TEXT
research
02/07/2022

Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Self-supervised learning (SSL) is a powerful tool that allows learning o...
research
03/31/2022

An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

Speech representations learned from Self-supervised learning (SSL) model...
research
06/09/2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

Self-supervised speech models are a rapidly developing research topic in...
research
04/30/2022

StorSeismic: A new paradigm in deep learning for seismic processing

Machine learned tasks on seismic data are often trained sequentially and...
research
12/13/2020

MiniVLM: A Smaller and Faster Vision-Language Model

Recent vision-language (VL) studies have shown remarkable progress by le...
research
10/22/2019

Speech-VGG: A deep feature extractor for speech processing

A growing number of studies in the field of speech processing employ fea...
research
12/06/2022

Parameter Efficient Transfer Learning for Various Speech Processing Tasks

Fine-tuning of self-supervised models is a powerful transfer learning me...

Please sign up or login with your details

Forgot password? Click here to reset