Characterizing the Latent Space of Molecular Deep Generative Models with Persistent Homology Metrics

by   Yair Schiff, et al.

Deep generative models are increasingly becoming integral parts of the in silico molecule design pipeline and have dual goals of learning the chemical and structural features that render candidate molecules viable while also being flexible enough to generate novel designs. Specifically, Variational Auto Encoders (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions in such a way that the latent space of the encoder network is smooth. Therefore, novel candidates can be found by sampling from this latent space. However, the scope of architectures and hyperparameters is vast and choosing the best combination for in silico discovery has important implications for downstream success. Therefore, it is important to develop a principled methodology for distinguishing how well a given generative model is able to learn salient molecular features. In this work, we propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features of molecular datasets by correlating latent space metrics with metrics from the field of topological data analysis (TDA). We apply our evaluation methodology to a VAE trained on SMILES strings and show that 3D topology information is consistently encoded throughout the latent space of the model.


Augmenting Molecular Deep Generative Models with Topological Data Analysis Representations

Deep generative models have emerged as a powerful tool for learning info...

Latent Space Refinement for Deep Generative Models

Deep generative models are becoming widely used across science and indus...

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics

De novo therapeutic design is challenged by a vast chemical repertoire a...

A Deep Generative Model for Graph Layout

As different layouts can characterize different aspects of the same grap...

Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders

Chemical autoencoders are attractive models as they combine chemical spa...

Controlling generative models with continuous factors of variations

Recent deep generative models are able to provide photo-realistic images...

Generating Tertiary Protein Structures via an Interpretative Variational Autoencoder

Much scientific enquiry across disciplines is founded upon a mechanistic...

Please sign up or login with your details

Forgot password? Click here to reset