PartitionVAE – a human-interpretable VAE

by   Fareed Sheriff, et al.

VAEs, or variational autoencoders, are autoencoders that explicitly learn the distribution of the input image space rather than assuming no prior information about the distribution. This allows it to classify similar samples close to each other in the latent space's distribution. VAEs classically assume the latent space is normally distributed, though many distribution priors work, and they encode this assumption through a K-L divergence term in the loss function. While VAEs learn the distribution of the latent space and naturally make each dimension in the latent space as disjoint from the others as possible, they do not group together similar features – the image space feature represented by one unit of the representation layer does not necessarily have high correlation with the feature represented by a neighboring unit of the representation layer. This makes it difficult to interpret VAEs since the representation layer is not structured in a way that is easy for humans to parse. We aim to make a more interpretable VAE by partitioning the representation layer into disjoint sets of units. Partitioning the representation layer into disjoint sets of interconnected units yields a prior that features of the input space to this new VAE, which we call a partition VAE or PVAE, are grouped together by correlation – for example, if our image space were the space of all ping ping game images (a somewhat complex image space we use to test our architecture) then we would hope the partitions in the representation layer each learned some large feature of the image like the characteristics of the ping pong table or the characteristics and position of the players or the ball. We also add to the PVAE a cost-saving measure: subresolution. Because we do not have access to GPU training environments for long periods of time and Google Colab Pro costs money, we attempt to decrease the complexity of the PVAE by outputting an image with dimensions scaled down from the input image by a constant factor, thus forcing the model to output a smaller version of the image. We then increase the resolution to calculate loss and train by interpolating through neighboring pixels. We train a tuned PVAE on MNIST and Sports10 to test its effectiveness.


page 4

page 5

page 6


Disentangling Variational Autoencoders

A variational autoencoder (VAE) is a probabilistic machine learning fram...

Multi-view hierarchical Variational AutoEncoders with Factor Analysis latent space

Real-world databases are complex, they usually present redundancy and sh...

Variational Capsule Encoder

We propose a novel capsule network based variational encoder architectur...

Towards Composable Distributions of Latent Space Augmentations

We propose a composable framework for latent space image augmentation th...

Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

VAE requires the standard Gaussian distribution as a prior in the latent...

VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models

Energy-based models (EBMs) have recently been successful in representing...

Data-driven modeling of time-domain induced polarization

We present a novel approach for data-driven modeling of the time-domain ...

Please sign up or login with your details

Forgot password? Click here to reset