Evaluating Model Robustness to Dataset Shift

10/28/2020
by   Adarsh Subbaswamy, et al.
0

As the use of machine learning in safety-critical domains becomes widespread, the importance of evaluating their safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which typically requires applying the model to multiple, independent datasets. Since the cost of collecting such datasets is often prohibitive, in this paper, we propose a framework for evaluating this type of robustness using a single, fixed evaluation dataset. We use the original evaluation data to define an uncertainty set of possible evaluation distributions and estimate the algorithm's performance on the "worst-case" distribution within this set. Specifically, we consider distribution shifts defined by conditional distributions, allowing some distributions to shift while keeping other portions of the data distribution fixed. This results in finer-grained control over the considered shifts and more plausible worst-case distributions than previous approaches based on covariate shifts. To address the challenges associated with estimation in complex, high-dimensional distributions, we derive a "debiased" estimator which maintains √(N)-consistency even when machine learning methods with slower convergence rates are used to estimate the nuisance parameters. In experiments on a real medical risk prediction task, we show that this estimator can be used to evaluate robustness and accounts for realistic shifts that cannot be expressed as covariate shift. The proposed framework provides a means for practitioners to proactively evaluate the safety of their models using a single validation dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2022

Evaluating Robustness to Dataset Shift via Parametric Robustness Sets

We give a method for proactively identifying small, plausible shifts in ...
research
07/05/2020

Robust Causal Inference Under Covariate Shift via Worst-Case Subpopulation Treatment Effects

We propose the worst-case treatment effect (WTE) across all subpopulatio...
research
07/01/2021

Mandoline: Model Evaluation under Distribution Shift

Machine learning models are often deployed in different settings than th...
research
07/24/2023

Safety Performance of Neural Networks in the Presence of Covariate Shift

Covariate shift may impact the operational safety performance of neural ...
research
06/17/2021

PAC Prediction Sets Under Covariate Shift

An important challenge facing modern machine learning is how to rigorous...
research
02/01/2023

Model Monitoring and Robustness of In-Use Machine Learning Models: Quantifying Data Distribution Shifts Using Population Stability Index

Safety goes first. Meeting and maintaining industry safety standards for...
research
09/18/2022

Towards Robust Off-Policy Evaluation via Human Inputs

Off-policy Evaluation (OPE) methods are crucial tools for evaluating pol...

Please sign up or login with your details

Forgot password? Click here to reset