Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

07/26/2021
by   Laurent Benaroya, et al.
0

Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable breakthroughs with the capacity to falsify a voice identity using a small amount of data with a highly realistic rendering. This paper goes beyond voice identity and presents a neural architecture that allows the manipulation of voice attributes (e.g., gender and age). Leveraging the latest advances on adversarial learning of structured speech representation, a novel structured neural network is proposed in which multiple auto-encoders are used to encode speech as a set of idealistically independent linguistic and extra-linguistic representations, which are learned adversariarly and can be manipulated during VC. Moreover, the proposed architecture is time-synchronized so that the original voice timing is preserved during conversion which allows lip-sync applications. Applied to voice gender conversion on the real-world VCTK dataset, our proposed architecture can learn successfully gender-independent representation and convert the voice gender with a very high efficiency and naturalness.

READ FULL TEXT

page 10

page 21

research
10/07/2021

Sequence-To-Sequence Voice Conversion using F0 and Time Conditioning and Adversarial Learning

This paper presents a sequence-to-sequence voice conversion (S2S-VC) alg...
research
08/05/2020

Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

This paper presents an adversarial learning method for recognition-synth...
research
12/03/2019

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion a...
research
10/22/2019

SoftGAN: Learning generative models efficiently with application to CycleGAN Voice Conversion

Voice conversion with deep neural networks has become extremely popular ...
research
04/22/2021

Protecting gender and identity with disentangled speech representations

Besides its linguistic content, our speech is rich in biometric informat...
research
10/20/2022

DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion

Voice conversion is a task to convert a non-linguistic feature of a give...
research
10/31/2022

VoicePrivacy 2022 System Description: Speaker Anonymization with Feature-matched F0 Trajectories

We introduce a novel method to improve the performance of the VoicePriva...

Please sign up or login with your details

Forgot password? Click here to reset