Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions
VAE requires the standard Gaussian distribution as a prior in the latent space. Since all codes tend to follow the same prior, it often suffers the so-called "posterior collapse". To avoid this, this paper introduces the class specific distribution for the latent code. But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, z_s and z_u, for a single input. We apply two separated encoders to map the input into z_s and z_u respectively, and then give the concatenated code to the decoder to reconstruct the input. The label irrelevant code z_u represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While z_s is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in z_s encoder are optimized by the label supervision in a global stochastic way. In theory, we show that our method is actually equivalent to adding a KL divergence term on the joint distribution of z_s and the class label c, and it can directly increase the mutual information between z_s and the label c. Our model can also be extended to GAN by adding a discriminator in the pixel domain so that it produces high quality and diverse images.
READ FULL TEXT