Mixed Emotion Modelling for Emotional Voice Conversion

by   Kun Zhou, et al.

Emotional voice conversion (EVC) aims to convert the emotional state of an utterance from one emotion to another while preserving the linguistic content and speaker identity. Current studies mostly focus on modelling the conversion between several specific emotion types. Synthesizing mixed effects of emotions could help us to better imitate human emotions, and facilitate more natural human-computer interaction. In this research, for the first time, we formulate and study the research problem of mixed emotion synthesis for EVC. We regard emotional styles as a series of emotion attributes that are learnt from a ranking-based support vector machine (SVM). Each attribute measures the degree of the relevance between the speech recordings belonging to different emotion types. We then incorporate those attributes into a sequence-to-sequence (seq2seq) emotional voice conversion framework. During the training, the framework not only learns to characterize the input emotional style, but also quantifies its relevance with other emotion types. At run-time, various emotional mixtures can be produced by manually defining the attributes. We conduct objective and subjective evaluations to validate our idea in terms of mixed emotion synthesis. We further build an emotion triangle as an application of emotion transition. Codes and speech samples are publicly available.


page 1

page 2

page 3

page 4


Speech Synthesis with Mixed Emotions

Emotional speech synthesis aims to synthesize human voices with various ...

Emotion Intensity and its Control for Emotional Voice Conversion

Emotional voice conversion (EVC) seeks to convert the emotional state of...

An Overview Analysis of Sequence-to-Sequence Emotional Voice Conversion

Emotional voice conversion (EVC) focuses on converting a speech utteranc...

Spanish expressive voices: Corpus for emotion research in spanish

A new emotional multimedia database has been recorded and aligned. The d...

EmoCat: Language-agnostic Emotional Voice Conversion

Emotional voice conversion models adapt the emotion in speech without ch...

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

Voice conversion (VC) transforms an utterance to sound like another pers...

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

Emotion classification of speech and assessment of the emotion strength ...

Please sign up or login with your details

Forgot password? Click here to reset