Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

08/20/2020
by   Noé Tits, et al.
0

Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listening test, comparing its performance to an HMM-based laughter synthesis one and assess that it reaches higher perceived naturalness. Our solution is a first step towards a TTS system that would be able to synthesize speech with a control on amusement level with laughter integration.

READ FULL TEXT
research
10/29/2018

Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Currently, there are increasing interests in text-to-speech (TTS) synthe...
research
10/14/2019

The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach

As part of the Human-Computer Interaction field, Expressive speech synth...
research
05/31/2021

Byakto Speech: Real-time long speech synthesis with convolutional neural network: Transfer learning from English to Bangla

Speech synthesis is one of the challenging tasks to automate by deep lea...
research
01/17/2022

MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Expressive synthetic speech is essential for many human-computer interac...
research
04/06/2022

Simple and Effective Unsupervised Speech Synthesis

We introduce the first unsupervised speech synthesis system based on a s...
research
01/25/2023

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic...
research
03/29/2022

Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis

End-to-end text-to-speech synthesis (TTS), which generates speech sounds...

Please sign up or login with your details

Forgot password? Click here to reset