Predicting Drug Solubility Using Different Machine Learning Methods – Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network

by   John Ho, et al.

Predicting the solubility of given molecules is an important task in the pharmaceutical industry, and consequently this is a well-studied topic. In this research, we revisited this problem with the advantage of modern computing resources. We applied two machine learning models, a linear regression model and a graph convolutional neural network model, on multiple experimental datasets. Both methods can make reasonable predictions while the GCNN model had the best performance. However, the current GCNN model is a black box, while feature importance analysis from the linear regression model offers more insights into the underlying chemical influences. Using the linear regression model, we show how each functional group affects the overall solubility. Ultimately, knowing how chemical structure influences chemical properties is crucial when designing new drugs. Future work should aim to combine the high performance of GCNNs with the interpretability of linear regression, unlocking new advances in next generation high throughput screening.


page 1

page 2

page 3

page 4


Bayesian Analysis on Limiting the Student-t Linear Regression Model

For the outlier problem in linear regression models, the Student-t linea...

Optimizing Offensive Gameplan in the National Basketball Association with Machine Learning

Throughout the analytical revolution that has occurred in the NBA, the d...

Size doesn't matter: predicting physico- or biochemical properties based on dozens of molecules

The use of machine learning in chemistry has become a common practice. A...

An Inverse QSAR Method Based on Linear Regression and Integer Programming

Recently a novel framework has been proposed for designing the molecular...

Supervised Linear Regression for Graph Learning from Graph Signals

We propose a supervised learning approach for predicting an underlying g...

Sources of high leverage in linear regression model

Some reasons for high leverage are analytically investigated by decompos...

Please sign up or login with your details

Forgot password? Click here to reset