A Consistent and Scalable Algorithm for Best Subset Selection in Single Index Models

09/12/2023
by   Borui Tang, et al.
0

Analysis of high-dimensional data has led to increased interest in both single index models (SIMs) and best subset selection. SIMs provide an interpretable and flexible modeling framework for high-dimensional data, while best subset selection aims to find a sparse model from a large set of predictors. However, best subset selection in high-dimensional models is known to be computationally intractable. Existing methods tend to relax the selection, but do not yield the best subset solution. In this paper, we directly tackle the intractability by proposing the first provably scalable algorithm for best subset selection in high-dimensional SIMs. Our algorithmic solution enjoys the subset selection consistency and has the oracle property with a high probability. The algorithm comprises a generalized information criterion to determine the support size of the regression coefficients, eliminating the model selection tuning. Moreover, our method does not assume an error distribution or a specific link function and hence is flexible to apply. Extensive simulation results demonstrate that our method is not only computationally efficient but also able to exactly recover the best subset in various settings (e.g., linear regression, Poisson regression, heteroscedastic models).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2019

Best Subset Selection in Reduced Rank Regression

Reduced rank regression is popularly used for modeling the relationship ...
research
06/10/2019

Selection consistency of Lasso-based procedures for misspecified high-dimensional binary model and random regressors

We consider selection of random predictors for high-dimensional regressi...
research
09/06/2021

Bayesian data selection

Insights into complex, high-dimensional data can be obtained by discover...
research
09/08/2020

Conditional Uncorrelation and Efficient Non-approximate Subset Selection in Sparse Regression

Given m d-dimensional responsors and n d-dimensional predictors, sparse ...
research
11/09/2017

Oracle inequalities for sign constrained generalized linear models

High-dimensional data have recently been analyzed because of data collec...
research
01/19/2017

Parameter Selection Algorithm For Continuous Variables

In this article, we propose a new algorithm for supervised learning meth...
research
12/05/2011

On best subset regression

In this paper we discuss the variable selection method from ℓ0-norm cons...

Please sign up or login with your details

Forgot password? Click here to reset