Double-estimation-friendly inference for high-dimensional misspecified models

by   Rajen D. Shah, et al.

All models may be wrong—but that is not necessarily a problem for inference. Consider the standard t-test for the significance of a variable X for predicting response Y whilst controlling for p other covariates Z in a random design linear model. This yields correct asymptotic type I error control for the null hypothesis that X is conditionally independent of Y given Z under an arbitrary regression model of Y on (X, Z), provided that a linear regression model for X on Z holds. An analogous robustness to misspecification, which we term the "double-estimation-friendly" (DEF) property, also holds for Wald tests in generalised linear models, with some small modifications. In this expository paper we explore this phenomenon, and propose methodology for high-dimensional regression settings that respects the DEF property. We advocate specifying (sparse) generalised linear regression models for both Y and the covariate of interest X; our framework gives valid inference for the conditional independence null if either of these hold. In the special case where both specifications are linear, our proposal amounts to a small modification of the popular debiased Lasso test. We also investigate constructing confidence intervals for the regression coefficient of X via inverting our tests; these have coverage guarantees even in partially linear models where the contribution of Z to Y can be arbitrary. Numerical experiments demonstrate the effectiveness of the methodology.


Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis

We study the problem of testing the null hypothesis that X and Y are con...

Goodness-of-fit testing in high-dimensional generalized linear models

We propose a family of tests to assess the goodness-of-fit of a high-dim...

Tests for ultrahigh-dimensional partially linear regression models

In this paper, we consider tests for ultrahigh-dimensional partially lin...

Shapley value confidence intervals for variable selection in regression models

Multiple linear regression is a commonly used inferential and predictive...

High-Dimensional Inference Based on the Leave-One-Covariate-Out LASSO Path

We propose a new measure of variable importance in high-dimensional regr...

Markov Neighborhood Regression for High-Dimensional Inference

This paper proposes an innovative method for constructing confidence int...

Inference in High-dimensional Linear Regression

We develop an approach to inference in a linear regression model when th...

Please sign up or login with your details

Forgot password? Click here to reset