Adaptive Bayesian SLOPE – High-dimensional Model Selection with Missing Values
The selection of variables with high-dimensional and missing data is a major challenge and very few methods are available to solve this problem. Here we propose a new method – adaptive Bayesian SLOPE – which is an extension of the SLOPE method of sorted l_1 regularization within a Bayesian framework and which allows to simultaneously estimate the parameters and select variables for large data despite missing values. The method follows the idea of the Spike and Slab LASSO, but replaces the Laplace mixture prior with the frequentist motivated "SLOPE" prior, which targets control of the False Discovery Rate. The regression parameters and the noise variance are estimated using stochastic approximation EM algorithm, which allows to incorporate missing values as well as latent model parameters, like the signal magnitude and its sparsity. Extensive simulations highlight the good behavior in terms of power, FDR and estimation bias under a wide range of simulation scenarios. Finally, we consider an application of severely traumatized patients from Paris hospitals to predict the level of platelet, and demonstrate, beyond the advantage of selecting relevant variables, which is crucial for interpretation, excellent predictive capabilities. The methodology is implemented in the R package ABSLOPE, which incorporates C++ code to improve the efficiency of the proposed method.
READ FULL TEXT