WaveFlow: A Compact Flow-based Model for Raw Audio

12/03/2019
by   Wei Ping, et al.
0

In this work, we present WaveFlow, a small-footprint generative flow for raw audio, which is trained with maximum likelihood without probability density distillation and auxiliary losses as used in Parallel WaveNet and ClariNet. It provides a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow as special cases. We systematically study these likelihood-based generative models for raw waveforms in terms of test likelihood and speech fidelity. We demonstrate that WaveFlow can synthesize high-fidelity speech as WaveNet, while only requiring a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, WaveFlow closes the significant likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has 5.91M parameters and can generate 22.05kHz high-fidelity speech 42.6 times faster than real-time on a GPU without engineered inference kernels.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro