Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

03/21/2023
by   Sung-Feng Huang, et al.
0

Personalized TTS is an exciting and highly desired application that allows users to train their TTS voice using only a few recordings. However, TTS training typically requires many hours of recording and a large model, making it unsuitable for deployment on mobile devices. To overcome this limitation, related works typically require fine-tuning a pre-trained TTS model to preserve its ability to generate high-quality audio samples while adapting to the target speaker's voice. This process is commonly referred to as “voice cloning.” Although related works have achieved significant success in changing the TTS model's voice, they are still required to fine-tune from a large pre-trained model, resulting in a significant size for the voice-cloned model. In this paper, we propose applying trainable structured pruning to voice cloning. By training the structured pruning masks with voice-cloning data, we can produce a unique pruned model for each target speaker. Our experiments demonstrate that using learnable structured pruning, we can compress the model size to 7 times smaller while achieving comparable voice-cloning performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2021

Adapting TTS models For New Speakers using Transfer Learning

Training neural text-to-speech (TTS) models for a new speaker typically ...
research
05/26/2020

Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

With the popularity of deep neural network, speech synthesis task has ac...
research
04/05/2022

Improving Voice Trigger Detection with Metric Learning

Voice trigger detection is an important task, which enables activating a...
research
05/12/2022

Training Strategies for Own Voice Reconstruction in Hearing Protection Devices using an In-ear Microphone

In-ear microphones in hearing protection devices can be utilized to capt...
research
03/18/2022

AdaVocoder: Adaptive Vocoder for Custom Voice

Custom voice is to construct a personal speech synthesis system by adapt...
research
06/05/2021

Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Nowadays, there is a strong need to deploy the target speaker separation...
research
03/25/2022

WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice

People are not always receptive to their voice data being collected and ...

Please sign up or login with your details

Forgot password? Click here to reset