Efficient Discovery of Heterogeneous Treatment Effects in Randomized Experiments via Anomalous Pattern Detection
The randomized experiment is an important tool for inferring the causal impact of an intervention. The recent literature on statistical learning methods for heterogeneous treatment effects demonstrates the utility of estimating the marginal conditional average treatment effect (MCATE), i.e., the average treatment effect for a subpopulation of respondents who share a particular subset of covariates. However, each proposed method makes its own set of restrictive assumptions about the intervention's effects, the underlying data generating processes, and which subpopulations (MCATEs) to explicitly estimate. Moreover, the majority of the literature provides no mechanism to identify which subpopulations are the most affected--beyond manual inspection--and provides little guarantee on the correctness of the identified subpopulations. Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for discovering which subpopulation in a randomized experiment is most significantly affected by a treatment. We frame this challenge as a pattern detection problem where we maximize a nonparametric scan statistic (measurement of distributional divergence) over subpopulations, while being parsimonious in which specific subpopulations to evaluate. Furthermore, we identify the subpopulation which experiences the largest distributional change as a result of the intervention, while making minimal assumptions about the intervention's effects or the underlying data generating process. In addition to the algorithm, we demonstrate that the asymptotic Type I and II error can be controlled, and provide sufficient conditions for detection consistency---i.e., exact identification of the affected subpopulation. Finally, we validate the efficacy of the method by discovering heterogeneous treatment effects in simulations and in real-world data from a well-known program evaluation study.
READ FULL TEXT