Neural voice cloning with a few low-quality samples

06/12/2020
by   Sunghee Jung, et al.
0

In this paper, we explore the possibility of speech synthesis from low quality found data using only limited number of samples of target speaker. We try to extract only the speaker embedding from found data of target speaker unlike previous works which tries to train the entire text-to-speech system on found data. Also, the two speaker mimicking approaches which are adaptation and speaker-encoder-based are applied on newly released LibriTTS dataset and previously released VCTK corpus to examine the impact of speaker variety on clarity and target-speaker-similarity .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Voice Imitating Text-to-Speech Neural Networks

We propose a neural text-to-speech (TTS) model that can imitate a new sp...
research
02/14/2018

Neural Voice Cloning with a Few Samples

Voice cloning is a highly desired feature for personalized speech interf...
research
09/15/2023

A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

We introduce a distinctive real-time, causal, neural network-based activ...
research
05/24/2022

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS

Recently, synthesizing personalized speech by text-to-speech (TTS) appli...
research
05/05/2021

How do Voices from Past Speech Synthesis Challenges Compare Today?

Shared challenges provide a venue for comparing systems trained on commo...
research
08/02/2021

Speaker Adaptation with Continuous Vocoder-based DNN-TTS

Traditional vocoder-based statistical parametric speech synthesis can be...
research
02/18/2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Dysarthric speech reconstruction (DSR), which aims to improve the qualit...

Please sign up or login with your details

Forgot password? Click here to reset