Mandoline: Model Evaluation under Distribution Shift

07/01/2021
by   Mayee Chen, et al.
17

Machine learning models are often deployed in different settings than they were trained and validated on, posing a challenge to practitioners who wish to predict how well the deployed model will perform on a target distribution. If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such as importance weighting can be applied to estimate performance on the target. However, importance weighting struggles when the source and target distributions have non-overlapping support or are high-dimensional. Taking inspiration from fields such as epidemiology and polling, we develop Mandoline, a new evaluation framework that mitigates these issues. Our key insight is that practitioners may have prior knowledge about the ways in which the distribution shifts, which we can use to better guide the importance weighting procedure. Specifically, users write simple "slicing functions" - noisy, potentially correlated binary functions intended to capture possible axes of distribution shift - to compute reweighted performance estimates. We further describe a density ratio estimation framework for the slices and show how its estimation error scales with slice quality and dataset size. Empirical validation on NLP and vision tasks shows that can estimate performance on the target distribution up to 3× more accurately compared to standard baselines.

READ FULL TEXT
research
07/06/2020

Estimating Generalization under Distribution Shifts via Domain-Invariant Representations

When machine learning models are deployed on a test distribution differe...
research
10/28/2020

Evaluating Model Robustness to Dataset Shift

As the use of machine learning in safety-critical domains becomes widesp...
research
07/31/2016

On Regularization Parameter Estimation under Covariate Shift

This paper identifies a problem with the usual procedure for L2-regulari...
research
06/07/2018

Importance weighted generative networks

Deep generative networks can simulate from a complex target distribution...
research
09/09/2023

Correcting sampling biases via importancereweighting for spatial modeling

In machine learning models, the estimation of errors is often complex du...
research
12/19/2021

Rethinking Importance Weighting for Transfer Learning

A key assumption in supervised learning is that training and test data f...
research
04/13/2022

Distributionally Robust Models with Parametric Likelihood Ratios

As machine learning models are deployed ever more broadly, it becomes in...

Please sign up or login with your details

Forgot password? Click here to reset