Optimal multi-environment causal regularization
In this manuscript we derive the optimal out-of-sample causal predictor for a linear system that has been observed in k+1 within-sample environments. In this model we consider k shifted environments and one observational environment. Each environment corresponds to a linear structural equation model (SEM) with its own shift and noise vector, both in L^2. The strength of the shifts can be put in a certain order, and we may therefore speak of all shifts that are less or equally strong than a given shift. We consider the space of all shifts are γ times less or equally strong than any weighted average of the observed shift vectors with weights on the unit sphere. For each β∈ℝ^p we show that the supremum of the risk functions R_Ã(β) over Ã∈ C^γ has a worst-risk decomposition into a (positive) linear combination of risk functions, depending on γ. We then define the causal regularizer, β_γ, as the argument β that minimizes this risk. The main result of the paper is that this regularizer can be consistently estimated with a plug-in estimator outside a set of zero Lebesgue measure in the parameter space. A practical obstacle for such estimation is that it involves the solution of a general degree polynomial which cannot be done explicitly. Therefore we also prove that an approximate plug-in estimator using the bisection method is also consistent. An interesting by-product of the proof of the main result is that the plug-in estimation of the argmin of the maxima of a finite set of quadratic risk functions is consistent outside a set of zero Lebesgue measure in the parameter space.
READ FULL TEXT