Learning inducing points and uncertainty on molecular data

by   Mikhail Tsitsvero, et al.

Uncertainty control and scalability to large datasets are the two main issues for the deployment of Gaussian process models into the autonomous material and chemical space exploration pipelines. One way to address both of these issues is by introducing the latent inducing variables and choosing the right approximation for the marginal log-likelihood objective. Here, we show that variational learning of the inducing points in the high-dimensional molecular descriptor space significantly improves both the prediction quality and uncertainty estimates on test configurations from a sample molecular dynamics dataset. Additionally, we show that inducing points can learn to represent the configurations of the molecules of different types that were not present within the initialization set of inducing points. Among several evaluated approximate marginal log-likelihood objectives, we show that the predictive log-likelihood provides both the predictive quality comparable to the exact Gaussian process model and excellent uncertainty control. Finally, we comment on whether a machine learning model makes predictions by interpolating the molecular configurations in high-dimensional descriptor space. We show that despite our intuition, and even for densely sampled molecular dynamics datasets, most of the predictions are done in the extrapolation regime.


page 1

page 2

page 3

page 4


Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF)

We introduce a kernel approximation strategy that enables computation of...

Gaussian Process Molecule Property Prediction with FlowMO

We present FlowMO: an open-source Python library for molecular property ...

Sparse Gaussian Process Hyperparameters: Optimize or Integrate?

The kernel function and its hyperparameters are the central model select...

Efficient non-conjugate Gaussian process factor models for spike count data using polynomial approximations

Gaussian Process Factor Analysis (GPFA) has been broadly applied to the ...

Uncertainty quantification for sparse spectral variational approximations in Gaussian process regression

We investigate the frequentist properties of the variational sparse Gaus...

Please sign up or login with your details

Forgot password? Click here to reset