Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis

06/29/2020
by   Antti Suni, et al.
0

Recent advances in deep learning methods have elevated synthetic speech quality to human level, and the field is now moving towards addressing prosodic variation in synthetic speech.Despite successes in this effort, the state-of-the-art systems fall short of faithfully reproducing local prosodic events that give rise to, e.g., word-level emphasis and phrasal structure. This type of prosodic variation often reflects long-distance semantic relationships that are not accessible for end-to-end systems with a single sentence as their synthesis domain. One of the possible solutions might be conditioning the synthesized speech by explicit prosodic labels, potentially generated using longer portions of text. In this work we evaluate whether augmenting the textual input with such prosodic labels capturing word-level prominence and phrasal boundary strength can result in more accurate realization of sentence prosody. We use an automatic wavelet-based technique to extract such labels from speech material, and use them as an input to a tacotron-like synthesis system alongside textual information. The results of objective evaluation of synthesized speech show that using the prosodic labels significantly improves the output in terms of faithfulness of f0 and energy contours, in comparison with state-of-the-art implementations.

READ FULL TEXT

page 2

page 4

research
01/03/2019

Feature reinforcement with word embedding and parsing information in neural TTS

In this paper, we propose a feature reinforcement method under the seque...
research
08/01/2021

End to End Bangla Speech Synthesis

Text-to-Speech (TTS) system is a system where speech is synthesized from...
research
10/29/2018

Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Currently, there are increasing interests in text-to-speech (TTS) synthe...
research
07/30/2020

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

This paper proposes a controllable end-to-end text-to-speech (TTS) syste...
research
08/13/2020

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Recent neural speech synthesis systems have gradually focused on the con...
research
12/17/2020

Parallel WaveNet conditioned on VAE latent vectors

Recently the state-of-the-art text-to-speech synthesis systems have shif...
research
04/12/2019

RNN-based speech synthesis using a continuous sinusoidal model

Recently in statistical parametric speech synthesis, we proposed a conti...

Please sign up or login with your details

Forgot password? Click here to reset