Agree to Disagree: Diversity through Disagreement for Better Transferability

by   Matteo Pagliardini, et al.

Gradient-based learning algorithms have an implicit simplicity bias which in effect can limit the diversity of predictors being sampled by the learning procedure. This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features – present in the training data but absent from the test data – and (ii) by only leveraging a small subset of predictive features. Such an effect is especially magnified when the test distribution does not exactly match the train distribution – referred to as the Out of Distribution (OOD) generalization problem. However, given only the training data, it is not always possible to apriori assess if a given feature is spurious or transferable. Instead, we advocate for learning an ensemble of models which capture a diverse set of predictive features. Towards this, we propose a new algorithm D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data, but disagreement on the OOD data. We show how D-BAT naturally emerges from the notion of generalized discrepancy, as well as demonstrate in multiple experiments how the proposed method can mitigate shortcut-learning, enhance uncertainty and OOD detection, as well as improve transferability.


page 2

page 6

page 13

page 14


Learning Diverse Features in Vision Transformers for Improved Generalization

Deep learning models often rely only on a small set of features even whe...

Quantifying and Improving Transferability in Domain Generalization

Out-of-distribution generalization is one of the key challenges when tra...

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness

Adversarial Transferability is an intriguing property of adversarial exa...

Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments

We develop a new, principled algorithm for estimating the contribution o...

Searching for test data with feature diversity

There is an implicit assumption in software testing that more diverse an...

Transferability of Neural Network-based De-identification Systems

Methods and Materials: We investigated transferability of neural network...

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

In this work we establish an algorithm and distribution independent non-...

Please sign up or login with your details

Forgot password? Click here to reset