Diversify and Disambiguate: Learning From Underspecified Data

02/07/2022
by   Yoonho Lee, et al.
11

Many datasets are underspecified, which means there are several equally viable solutions for the data. Underspecified datasets can be problematic for methods that learn a single hypothesis because different functions that achieve low training loss can focus on different predictive features and thus have widely varying predictions on out-of-distribution data. We propose DivDis, a simple two-stage framework that first learns a diverse collection of hypotheses for a task by leveraging unlabeled data from the test distribution. We then disambiguate by selecting one of the discovered hypotheses using minimal additional supervision, in the form of additional labels or inspection of function visualization. We demonstrate the ability of DivDis to find hypotheses that use robust features in image classification and natural language processing problems with underspecification.

READ FULL TEXT

page 6

page 8

page 14

page 15

page 16

research
11/05/2022

Learning to Infer from Unlabeled Data: A Semi-supervised Learning Approach for Robust Natural Language Inference

Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE)...
research
05/22/2020

L2R2: Leveraging Ranking for Abductive Reasoning

The abductive natural language inference task (αNLI) is proposed to eval...
research
04/20/2020

Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

One of the primary challenges limiting the applicability of deep learnin...
research
12/16/2021

Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets

Natural language inference (NLI) is an important task for producing usef...
research
06/07/2023

On the Joint Interaction of Models, Data, and Features

Learning features from data is one of the defining characteristics of de...
research
08/21/2023

Spurious Correlations and Where to Find Them

Spurious correlations occur when a model learns unreliable features from...
research
02/11/2022

Distributionally Robust Data Join

Suppose we are given two datasets: a labeled dataset and unlabeled datas...

Please sign up or login with your details

Forgot password? Click here to reset