Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning

02/09/2019
by   Chu Qin, et al.
0

Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space mapping, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored the superior learning capability of deep autoencoders for unsupervised clustering of 1.39 mil-lion bioactive molecules into band-clusters in a 3-dimensional latent chemical space. These band-clusters, displayed by a space-navigation simulation software, band molecules of selected bioactivity classes into individual band-clusters possessing unique sets of common sub-structural features beyond structural similarity. These sub-structural features form the frameworks of the literature-reported pharmacophores and privileged fragments. Within each band-cluster, molecules are further banded into selected sub-regions with respect to their bioactivity target, sub-structural features and molecular scaffolds. Our method is potentially applicable for big data clustering tasks of different fields.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

Scaffold Embeddings: Learning the Structure Spanned by Chemical Fragments, Scaffolds and Compounds

Molecules have seemed like a natural fit to deep learning's tendency to ...
research
03/15/2019

Phenotypic Profiling of High Throughput Imaging Screens with Generic Deep Convolutional Features

While deep learning has seen many recent applications to drug discovery,...
research
11/28/2022

Predicting pathways for old and new metabolites through clustering

The diverse metabolic pathways are fundamental to all living organisms, ...
research
10/27/2021

The chemical space of terpenes: insights from data science and AI

Terpenes are a widespread class of natural products with significant che...
research
12/03/2022

Multi-view deep learning based molecule design and structural optimization accelerates the SARS-CoV-2 inhibitor discovery

In this work, we propose MEDICO, a Multi-viEw Deep generative model for ...
research
04/21/2022

Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space

We introduce an unsupervised clustering algorithm to improve training ef...
research
03/22/2019

Principal nested shape space analysis of molecular dynamics data

Molecular dynamics simulations produce huge datasets of temporal sequenc...

Please sign up or login with your details

Forgot password? Click here to reset