Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach

06/13/2023
by   Jian Chen, et al.
0

We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making. The labeled data distribution may substantially differ from the full population, especially when the historical decisions and the target outcome can be simultaneously affected by some unobserved factors. Consequently, learning with only the labeled data may lead to severely biased results when deployed to the full population. Our paper tackles this challenge by exploiting the fact that in many applications the historical decisions were made by a set of heterogeneous decision-makers. In particular, we analyze this setup in a principled instrumental variable (IV) framework. We establish conditions for the full-population risk of any given prediction rule to be point-identified from the observed data and provide sharp risk bounds when the point identification fails. We further propose a weighted learning approach that learns prediction rules robust to the label selection bias in both identification settings. Finally, we apply our proposed approach to a semi-synthetic financial dataset and demonstrate its superior performance in the presence of selection bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2018

Learning under selective labels in the presence of expert consistency

We explore the problem of learning under selective labels in the context...
research
02/13/2023

SelectionBias: An R Package for Bounding Selection Bias

Selection bias can occur when subjects are included or excluded in the a...
research
02/08/2021

A Ranking Approach to Fair Classification

Algorithmic decision systems are increasingly used in areas such as hiri...
research
09/05/2022

Learning from a Biased Sample

The empirical risk minimization approach to data-driven decision making ...
research
12/19/2022

Counterfactual Risk Assessments under Unmeasured Confounding

Statistical risk assessments inform consequential decisions, such as pre...
research
01/13/2018

Fairness in Supervised Learning: An Information Theoretic Approach

Automated decision making systems are increasingly being used in real-wo...
research
01/25/2020

On the Fairness of Randomized Trials for Recommendation With Heterogeneous Demographics and Beyond

Observed events in recommendation are consequence of the decisions made ...

Please sign up or login with your details

Forgot password? Click here to reset