Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network Virtual Domain Pairing

02/21/2023
by   Nirmesh Shah, et al.
6

Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emotions for seen speaker-emotion combinations only. In this paper, we tackle the problem of converting the emotion of speakers whose only neutral data are present during the time of training and testing (i.e., unseen speaker-emotion combinations). To this end, we extend a recently proposed StartGANv2-VC architecture by utilizing dual encoders for learning the speaker and emotion style embeddings separately along with dual domain source classifiers. For achieving the conversion to unseen speaker-emotion combinations, we propose a Virtual Domain Pairing (VDP) training strategy, which virtually incorporates the speaker-emotion pairs that are not present in the real data without compromising the min-max game of a discriminator and generator in adversarial training. We evaluate the proposed method using a Hindi emotional database.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

Emotional voice conversion aims to convert the emotion of the speech fro...
research
10/20/2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity

Expressive voice conversion performs identity conversion for emotional s...
research
09/14/2023

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

Voice conversion (VC) transforms an utterance to sound like another pers...
research
10/04/2021

Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks

Emotional voice conversion (VC) aims to convert a neutral voice to an em...
research
06/14/2022

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

In this work, we explore a novel few-shot personalisation architecture f...
research
07/25/2020

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

We introduce a novel method for emotion conversion in speech that does n...
research
11/09/2022

A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion Conversion

This paper introduces a new framework for non-parallel emotion conversio...

Please sign up or login with your details

Forgot password? Click here to reset