Kernel Conditional Moment Constraints for Confounding Robust Inference

02/26/2023
by   Kei Ishikawa, et al.
0

We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.

READ FULL TEXT
research
11/11/2015

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

We study the problem of off-policy value evaluation in reinforcement lea...
research
02/11/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Off-policy evaluation of sequential decision policies from observational...
research
03/12/2020

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

When observed decisions depend only on observed features, off-policy pol...
research
11/25/2022

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to accurately evaluate the performance ...
research
06/18/2020

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

We consider off-policy evaluation in the contextual bandit setting for t...
research
03/02/2023

Hallucinated Adversarial Control for Conservative Offline Policy Evaluation

We study the problem of conservative off-policy evaluation (COPE) where ...

Please sign up or login with your details

Forgot password? Click here to reset