Structural randomised selection
An important problem in the analysis of high-dimensional omics data is to identify subsets of molecular variables that are associated with a phenotype of interest. This requires addressing the challenges of high dimensionality, strong multicollinearity and model uncertainty. We propose a new ensemble learning approach for improving the performance of sparse penalised regression methods, called STructural RANDomised Selection (STRANDS). The approach, that builds and improves upon the Random Lasso method, consists of two steps. In both steps, we reduce dimensionality by repeated subsampling of variables. We apply a penalised regression method to each subsampled dataset and average the results. In the first step, subsampling is informed by variable correlation structure, and in the second step, by variable importance measures from the first step. STRANDS can be used with any sparse penalised regression approach as the "base learner". Using synthetic data and real biological datasets, we demonstrate that STRANDS typically improves upon its base learner, and that taking account of the correlation structure in the first step can help to improve the efficiency with which the model space may be explored.
READ FULL TEXT