Essential regression
Essential Regression is a new type of latent factor regression model, where unobserved factors Z∈^K influence linearly both the response Y∈ and the covariates X∈^p with K≪ p. Its novelty consists in the conditions that give Z interpretable meaning and render the regression coefficients β∈^K relating Y to Z -- along with other important parameters of the model -- identifiable. It provides tools for high dimensional regression modelling that are especially powerful when the relationship between a response and essential representatives Z of the X-variables is of interest. Since in classical factor regression models Z is often not identifiable, nor practically interpretable, inference for β is not of direct interest and has received little attention. We bridge this gap in E-Regressions models: we develop a computationally efficient estimator of β, show that it is minimax-rate optimal (in Euclidean norm) and component-wise asymptotically normal, with small asymptotic variance. Inference in Essential Regression is performed after consistently estimating the unknown dimension K, and all the K subsets of the X-variables that explain, respectively, the individual components of Z. It is valid uniformly in β∈^K, in contrast with existing results on inference in sparse regression after consistent support recovery, which are not valid for regression coefficients of Y on X near zero. Prediction of Y from X under Essential Regression complements, in a low signal-to-noise ratio regime, the battery of methods developed for prediction under different factor regression model specifications. Similarly to other methods, it is particularly powerful when p is large, with further refinements made possible by our model specifications.
READ FULL TEXT