RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses

11/01/2021
by   Shengyuan Xu, et al.
3

Most GAN(Generative Adversarial Network)-based approaches towards high-fidelity waveform generation heavily rely on discriminators to improve their performance. However, the over-use of this GAN method introduces much uncertainty into the generation process and often result in mismatches of pitch and intensity, which is fatal when it comes to sensitive using cases such as singing voice synthesis(SVS). To address this problem, we propose RefineGAN, a high-fidelity neural vocoder with faster-than-real-time generation capability, and focused on the robustness, pitch and intensity accuracy, and full-band audio generation. We employed a pitch-guided refine architecture with a multi-scale spectrogram-based loss function to help stabilize the training process and maintain the robustness of the neural vocoder while using the GAN-based training method. Audio generated using this method shows a better performance in subjective tests when compared with the ground-truth audio. This result shows that the fidelity is even improved during the waveform reconstruction by eliminating defects produced by the speaker and the recording procedure. Moreover, a further study shows that models trained on a specified type of data can perform on totally unseen language and unseen speaker identically well. Generated sample pairs are provided on https://timedomain-tech.github.io/refinegan/.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
06/04/2021

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Although recent works on neural vocoder have improved the quality of syn...
research
12/06/2018

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

Neural networks based vocoders, typically the WaveNet, have achieved spe...
research
06/09/2022

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Despite recent progress in generative adversarial network(GAN)-based voc...
research
11/19/2020

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

We propose Universal MelGAN, a vocoder that synthesizes high-fidelity sp...
research
09/06/2023

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

Generative adversarial network (GAN)-based vocoders have been intensivel...
research
03/26/2021

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN

GAN-based neural vocoders, such as Parallel WaveGAN and MelGAN have attr...
research
06/15/2021

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Most neural vocoders employ band-limited mel-spectrograms to generate wa...

Please sign up or login with your details

Forgot password? Click here to reset