Markov Neighborhood Regression for High-Dimensional Inference

by   Faming Liang, et al.

This paper proposes an innovative method for constructing confidence intervals and assessing p-values in statistical inference for high-dimensional linear models. The proposed method has successfully broken the high-dimensional inference problem into a series of low-dimensional inference problems: For each regression coefficient β_i, the confidence interval and p-value are computed by regressing on a subset of variables selected according to the conditional independence relations between the corresponding variable X_i and other variables. Since the subset of variables forms a Markov neighborhood of X_i in the Markov network formed by all the variables X_1,X_2,…,X_p, the proposed method is coined as Markov neighborhood regression. The proposed method is tested on high-dimensional linear, logistic and Cox regression. The numerical results indicate that the proposed method significantly outperforms the existing ones. Based on the Markov neighborhood regression, a method of learning causal structures for high-dimensional linear models is proposed and applied to identification of drug sensitive genes and cancer driver genes. The idea of using conditional independence relations for dimension reduction is general and potentially can be extended to other high-dimensional or big data problems as well.


page 1

page 2

page 3

page 4


SIHR: An R Package for Statistical Inference in High-dimensional Linear and Logistic Regression Models

We introduce and illustrate through numerical examples the R package wh...

Markov Boundary Discovery with Ridge Regularized Linear Models

Ridge regularized linear models (RRLMs), such as ridge regression and th...

A Double Regression Method for Graphical Modeling of High-dimensional Nonlinear and Non-Gaussian Data

Graphical models have long been studied in statistics as a tool for infe...

Statistical Inference in High-Dimensional Generalized Linear Models with Asymmetric Link Functions

We have developed a statistical inference method applicable to a broad r...

The Landmark Selection Method for Multiple Output Prediction

Conditional modeling x → y is a central problem in machine learning. A s...

Cox reduction and confidence sets of models: a theoretical elucidation

For sparse high-dimensional regression problems, Cox and Battey [1, 9] e...

Double-estimation-friendly inference for high-dimensional misspecified models

All models may be wrong—but that is not necessarily a problem for infere...

Please sign up or login with your details

Forgot password? Click here to reset