Benchmarking Generative Latent Variable Models for Speech

02/22/2022
by   Jakob D. Havtorn, et al.
0

Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, and often incomparably, reported for speech models. To assess the quality of the learned representations, we also compare their usefulness for phoneme recognition. Finally, we adapt the Clockwork VAE, a state-of-the-art temporal LVM for video generation, to the speech domain. Despite being autoregressive only in latent space, we find that the Clockwork VAE can outperform previous LVMs and reduce the gap to deterministic models by using a hierarchy of latent variables.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2018

Disentangling Latent Factors with Whitening

After the success of deep generative models in image generation tasks, l...
research
02/06/2019

BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling

With the introduction of the variational autoencoder (VAE), probabilisti...
research
08/26/2019

PixelVAE++: Improved PixelVAE with Discrete Prior

Constructing powerful generative models for natural images is a challeng...
research
02/04/2019

Re-examination of the Role of Latent Variables in Sequence Modeling

With latent variables, stochastic recurrent models have achieved state-o...
research
11/30/2017

Auxiliary Guided Autoregressive Variational Autoencoders

Generative modeling of high-dimensional data is a key problem in machine...
research
05/28/2018

Theory and Experiments on Vector Quantized Autoencoders

Deep neural networks with discrete latent variables offer the promise of...
research
10/24/2020

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

Neural latent variable models enable the discovery of interesting struct...

Please sign up or login with your details

Forgot password? Click here to reset