Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach

03/11/2019
by   Zhe Fei, et al.
0

For a better understanding of the molecular causes of lung cancer, the Boston Lung Cancer Study (BLCS) has generated comprehensive molecular data from both lung cancer cases and controls. It has been challenging to model such high dimensional data with non-linear outcomes, and to give accurate uncertainty measures of the estimators. To properly infer cancer risks at the molecular level, we propose a novel inference framework for generalized linear models and use it to estimate the high dimensional SNP effects and their potential interactions with smoking. We use multi-sample splitting and smoothing to reduce the highdimensional problem to low-dimensional maximum likelihood estimations. Unlike other methods, the proposed estimator does not involve penalization/regularization and, thus, avoids its drawbacks in making inferences. Our estimator is asymptotically unbiased and normal, and gives confidence intervals with proper coverage. To facilitate hypothesis testing and drawing inferences on predetermined contrasts, our method can be applied to infer any fixed low-dimensional parameters in the presence of high dimensional nuisance parameters. To demonstrate the advantages of the method, we conduct extensive simulations, and analyze the BLCS SNP data and obtain some biologically meaningful results.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset