Scalable Gaussian Processes for Data-Driven Design using Big Data with Categorical Factors

by   Liwei Wang, et al.

Scientific and engineering problems often require the use of artificial intelligence to aid understanding and the search for promising designs. While Gaussian processes (GP) stand out as easy-to-use and interpretable learners, they have difficulties in accommodating big datasets, categorical inputs, and multiple responses, which has become a common challenge for a growing number of data-driven design applications. In this paper, we propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously. The method is built upon the latent variable Gaussian process (LVGP) model where categorical factors are mapped into a continuous latent space to enable GP modeling of mixed-variable datasets. By extending variational inference to LVGP models, the large training dataset is replaced by a small set of inducing points to address the scalability issue. Output response vectors are represented by a linear combination of independent latent functions, forming a flexible kernel structure to handle multiple responses that might have distinct behaviors. Comparative studies demonstrate that the proposed method scales well for large datasets with over 10^4 data points, while outperforming state-of-the-art machine learning methods without requiring much hyperparameter tuning. In addition, an interpretable latent space is obtained to draw insights into the effect of categorical factors, such as those associated with building blocks of architectures and element choices in metamaterial and materials design. Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism with aperiodic microstructures and multiple materials.


page 1

page 2

page 3

page 4


Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Gaussian processes (GPs) are a powerful tool for probabilistic inference...

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Multivariate categorical data occur in many applications of machine lear...

Regularized Sparse Gaussian Processes

Gaussian processes are a flexible Bayesian nonparametric modelling appro...

Data-Driven Wireless Communication Using Gaussian Processes

Data-driven paradigms are well-known and salient demands of future wirel...

Data-Driven Topology Optimization with Multiclass Microstructures using Latent Variable Gaussian Process

The data-driven approach is emerging as a promising method for the topol...

Rapid Design of Top-Performing Metal-Organic Frameworks with Qualitative Representations of Building Blocks

Data-driven materials design often encounters challenges where systems r...

Fast Modeling Methods for Complex System with Separable Features

Data-driven modeling plays an increasingly important role in different a...

Please sign up or login with your details

Forgot password? Click here to reset