Choosing a Proxy Metric from Past Experiments

by   Nilesh Tripuraneni, et al.

In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric – so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.


page 1

page 2

page 3

page 4


Pareto optimal proxy metrics

North star metrics and online experimentation play a central role in how...

Online Experimentation with Surrogate Metrics: Guidelines and a Case Study

A/B tests have been widely adopted across industries as the golden rule ...

A Reinforcement Learning Approach to Estimating Long-term Treatment Effects

Randomized experiments (a.k.a. A/B tests) are a powerful tool for estima...

Early Detection of Long Term Evaluation Criteria in Online Controlled Experiments

A common dilemma encountered by many upon implementing an optimization m...

Estimating the Long-Term Effects of Novel Treatments

Policy makers typically face the problem of wanting to estimate the long...

Efficient Heterogeneous Treatment Effect Estimation With Multiple Experiments and Multiple Outcomes

Learning heterogeneous treatment effects (HTEs) is an important problem ...

Novelty and Primacy: A Long-Term Estimator for Online Experiments

Online experiments are the gold standard for evaluating impact on user e...

Please sign up or login with your details

Forgot password? Click here to reset