Shapley value confidence intervals for variable selection in regression models
Multiple linear regression is a commonly used inferential and predictive process, whereby a single response variable is modeled via an affine combination of multiple explanatory covariates. The coefficient of determination is often used to measure the explanatory power of the chosen combination of covariates. A ranking of the explanatory contribution of each of the individual covariates is often sought in order to draw inference regarding the importance of each covariate with respect to the response phenomenon. A recent method for ascertaining such a ranking is via the game theoretic Shapley value decomposition of the coefficient of determination. Such a decomposition has the desirable efficiency, monotonicity, and equal treatment properties. Under an elliptical assumption, we obtain the asymptotic normality of the Shapley values. We then utilize this result in order to construct confidence intervals and hypothesis tests regarding such quantities. Monte Carlo studies regarding our results are provided. We found that our asymptotic confidence intervals are computationally superior to competing bootstrap methods and are able to improve upon the performance of such intervals. Analyses of housing and real estate data are used to demonstrate the applicability of our methodology.
READ FULL TEXT