Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

by   Joachim Schaeffer, et al.

High-dimensional linear regression is important in many scientific fields. This article considers discrete measured data of underlying smooth latent processes, as is often obtained from chemical or biological systems. Interpretation in high dimensions is challenging because the nullspace and its interplay with regularization shapes regression coefficients. The data's nullspace contains all coefficients that satisfy 𝐗𝐰=0, thus allowing very different coefficients to yield identical predictions. We developed an optimization formulation to compare regression coefficients and coefficients obtained by physical engineering knowledge to understand which part of the coefficient differences are close to the nullspace. This nullspace method is tested on a synthetic example and lithium-ion battery data. The case studies show that regularization and z-scoring are design choices that, if chosen corresponding to prior physical knowledge, lead to interpretable regression results. Otherwise, the combination of the nullspace and regularization hinders interpretability and can make it impossible to obtain regression coefficients close to the true coefficients when there is a true underlying linear model. Furthermore, we demonstrate that regression methods that do not produce coefficients orthogonal to the nullspace, such as fused lasso, can improve interpretability. In conclusion, the insights gained from the nullspace perspective help to make informed design choices for building regression models on high-dimensional data and reasoning about potential underlying linear models, which are important for system optimization and improving scientific understanding.


page 1

page 2

page 3

page 4


Pattern recovery and signal denoising by SLOPE when the design matrix is orthogonal

Sorted ℓ_1 Penalized Estimator (SLOPE) is a relatively new convex regula...

An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression

High-dimensional linear regression model is the most popular statistical...

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

In this paper, we present a new and effective simulation-based approach ...

Denoising and change point localisation in piecewise-constant high-dimensional regression coefficients

We study the theoretical properties of the fused lasso procedure origina...

High-dimensional regression over disease subgroups

We consider high-dimensional regression over subgroups of observations. ...

Inference in High-dimensional Linear Regression

We develop an approach to inference in a linear regression model when th...

The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions

In high-dimensional linear regression, would increasing effect sizes alw...

Please sign up or login with your details

Forgot password? Click here to reset