A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

10/12/2018
by   Azam Rabiee, et al.
0

This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the multi-level wavelet analysis/synthesis to decompose/reconstruct the signal to subbands in time domain. Inspired from the WaveNet, a convolutional neural network (CNN) model predicts subband speech signals fully in time domain. Due to the short bandwidth of the subbands, a simple network architecture is enough to train the simple patterns of the subbands accurately. In the ground truth experiments with teacher forcing, the subband synthesizer outperforms the fullband model significantly. In addition, by conditioning the model on the phoneme sequence using a pronunciation dictionary, we have achieved the first fully time-domain neural text-to-speech (TTS) system. The generated speech of the subband TTS shows comparable quality as the fullband one with a slighter network architecture for each subband.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/1998

Text-To-Speech Conversion with Neural Networks: A Recurrent TDNN Approach

This paper describes the design of a neural network that performs the ph...
research
06/15/2016

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

Deep Neural Networks (DNN) have been successful in en- hancing noisy spe...
research
08/24/2020

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

Speech is converted to digital signals using speech coding for efficient...
research
03/11/2019

Deep Text-to-Speech System with Seq2Seq Model

Recent trends in neural network based text-to-speech/speech synthesis pi...
research
04/15/2020

Explaining Regression Based Neural Network Model

Several methods have been proposed to explain Deep Neural Network (DNN)....
research
11/02/2022

Neural Fourier Shift for Binaural Speech Rendering

We present a neural network for rendering binaural speech from given mon...
research
05/03/2021

Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks

In this paper, we present a full-reference speech quality prediction mod...

Please sign up or login with your details

Forgot password? Click here to reset