Graphical Gaussian Process Regression Model for Aqueous Solvation Free Energy Prediction of Organic Molecules in Redox Flow Battery

by   Peiyuan Gao, et al.

The solvation free energy of organic molecules is a critical parameter in determining emergent properties such as solubility, liquid-phase equilibrium constants, and pKa and redox potentials in an organic redox flow battery. In this work, we present a machine learning (ML) model that can learn and predict the aqueous solvation free energy of an organic molecule using Gaussian process regression method based on a new molecular graph kernel. To investigate the performance of the ML model on electrostatic interaction, the nonpolar interaction contribution of solvent and the conformational entropy of solute in solvation free energy, three data sets with implicit or explicit water solvent models, and contribution of conformational entropy of solute are tested. We demonstrate that our ML model can predict the solvation free energy of molecules at chemical accuracy with a mean absolute error of less than 1 kcal/mol for subsets of the QM9 dataset and the Freesolv database. To solve the general data scarcity problem for a graph-based ML model, we propose a dimension reduction algorithm based on the distance between molecular graphs, which can be used to examine the diversity of the molecular data set. It provides a promising way to build a minimum training set to improve prediction for certain test sets where the space of molecular structures is predetermined.


Molecular-orbital-based Machine Learning for Open-shell and Multi-reference Systems with Kernel Addition Gaussian Process Regression

We introduce a novel machine learning strategy, kernel addition Gaussian...

On the Interplay of Subset Selection and Informed Graph Neural Networks

Machine learning techniques paired with the availability of massive data...

Prediction of Atomization Energy Using Graph Kernel and Active Learning

Data-driven prediction of molecular properties presents unique challenge...

Building Robust Machine Learning Models for Small Chemical Science Data: The Case of Shear Viscosity

Shear viscosity, though being a fundamental property of all liquids, is ...

Deciphering Cryptic Behavior in Bimetallic Transition Metal Complexes with Machine Learning

The rational tailoring of transition metal complexes is necessary to add...

Less is more: sampling chemical space with active learning

The development of accurate and transferable machine learning (ML) poten...

Particle Swarm Based Hyper-Parameter Optimization for Machine Learned Interatomic Potentials

Modeling non-empirical and highly flexible interatomic potential energy ...

Please sign up or login with your details

Forgot password? Click here to reset