StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

10/07/2021
by   Rui Liu, et al.
0

Recently, emotional speech synthesis has achieved remarkable performance. The emotion strength of synthesized speech can be controlled flexibly using a strength descriptor, which is obtained by an emotion attribute ranking function. However, a trained ranking function on specific data has poor generalization, which limits its applicability for more realistic cases. In this paper, we propose a deep learning based emotion strength assessment network for strength prediction that is referred to as StrengthNet. Our model conforms to a multi-task learning framework with a structure that includes an acoustic encoder, a strength predictor and an auxiliary emotion predictor. A data augmentation strategy was utilized to improve the model generalization. Experiments show that the predicted emotion strength of the proposed StrengthNet are highly correlated with ground truth scores for seen and unseen speech. Our codes are available at: https://github.com/ttslr/StrengthNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2022

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

Emotion classification of speech and assessment of the emotion strength ...
research
11/17/2020

Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis

This paper proposes a unified model to conduct emotion transfer, control...
research
11/17/2020

Controllable Emotion Transfer For End-to-End Speech Synthesis

Emotion embedding space learned from references is a straightforward app...
research
06/30/2022

Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems

This paper proposes an effective emotional text-to-speech (TTS) system w...
research
10/19/2021

Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

Learning emotion embedding from reference audio is a straightforward app...
research
05/27/2019

EG-GAN: Cross-Language Emotion Gain Synthesis based on Cycle-Consistent Adversarial Networks

Despite remarkable contributions from existing emotional speech synthesi...
research
07/12/2023

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

Despite recent advancements in speech emotion recognition (SER) models, ...

Please sign up or login with your details

Forgot password? Click here to reset