Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

06/21/2019
by   Yuan-Hao Yi, et al.
0

This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS). Singing voice differs from speech and it contains more local dynamic movements of acoustic features, e.g., vibratos. Therefore, our method adopts deep autoregressive (DAR) models to predict the F0 and spectral features of singing voice in order to better describe the dependencies among the acoustic features of consecutive frames. For F0 modeling, discretized F0 values are used and the influences of the history length in DAR are analyzed by experiments. An F0 post-processing strategy is also designed to alleviate the inconsistency between the predicted F0 contours and the F0 values determined by music notes. Furthermore, we extend the DAR model to deal with continuous spectral features, and a prenet module with self-attention layers is introduced to process historical frames. Experiments on a Chinese singing voice corpus demonstrate that our method using DARs can produce F0 contours with vibratos effectively, and can achieve better objective and subjective performance than the conventional method using recurrent neural networks (RNNs).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/23/2020

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) sy...
research
03/15/2023

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Singing voice synthesis (SVS), as a specific task for generating the voc...
research
10/16/2018

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

In this paper, a neural network named Sequence-to- sequence ConvErsion N...
research
01/24/2018

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

This paper presents a waveform modeling and generation method using hier...
research
08/06/2021

An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

With the rapid development of neural network architectures and speech pr...
research
06/15/2021

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Recent developments in deep learning have significantly improved the qua...
research
04/15/2019

Singing voice synthesis based on convolutional neural networks

The present paper describes a singing voice synthesis based on convoluti...

Please sign up or login with your details

Forgot password? Click here to reset