Variable Selection Using Nearest Neighbor Gaussian Processes
A novel Bayesian approach to the problem of variable selection using Gaussian process regression is proposed. The selection of the most relevant variables for a problem at hand often results in an increased interpretability and in many cases is an essential step in terms of model regularization. In detail, the proposed method relies on so-called nearest neighbor Gaussian processes, that can be considered as highly scalable approximations of classical Gaussian processes. To perform a variable selection the mean and the covariance function of the process are conditioned on a random set 𝒜. This set holds the indices of variables that contribute to the model. While the specification of a priori beliefs regarding 𝒜 allows to control the number of selected variables, so-called reference priors are assigned to the remaining model parameters. The application of the reference priors ensures that the process covariance matrix is (numerically) robust. For the model inference a Metropolis within Gibbs algorithm is proposed. Based on simulated data, an approximation problem from computer experiments and two real-world datasets, the performance of the new approach is evaluated.
READ FULL TEXT