QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

03/14/2023
by   Haobin Tang, et al.
0

Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver the speaker's questioning intention while transferring emotion from reference speech. We propose a multi-style extractor to extract style embedding from two different levels. While the sentence level represents emotion, the final syllable level represents intonation. For fine-grained intonation control, we use relative attributes to represent intonation intensity at the syllable level.Experiments have validated the effectiveness of QI-TTS for improving intonation expressiveness in emotional speech synthesis.

READ FULL TEXT
research
01/10/2022

Emotion Intensity and its Control for Emotional Voice Conversion

Emotional voice conversion (EVC) seeks to convert the emotional state of...
research
11/06/2018

Robust and fine-grained prosody control of end-to-end speech synthesis

We propose prosody embeddings for emotional and expressive speech synthe...
research
01/17/2022

MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Expressive synthetic speech is essential for many human-computer interac...
research
06/27/2023

CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation

Existing fine-grained intensity regulation methods rely on explicit cont...
research
10/27/2022

Explicit Intensity Control for Accented Text-to-speech

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...
research
03/02/2023

Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities

State-of-the-art Text-To-Speech (TTS) models are capable of producing hi...
research
11/19/2022

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling

This paper aims to synthesize target speaker's speech with desired speak...

Please sign up or login with your details

Forgot password? Click here to reset