Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

by   Devansh Arpit, et al.

In Domain Generalization (DG) settings, models trained on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that a simple protocol for averaging model parameters along the optimization path, starting early during training, both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable model selection. Next, we show that an ensemble of independently trained models also has a chaotic behavior in the DG setting. Taking advantage of our observation, we show that instead of ensembling unaveraged models, ensembling moving average models (EoA) from different runs does increase stability and further boosts performance. On the DomainBed benchmark, when using a ResNet-50 pre-trained on ImageNet, this ensemble of averages achieves 88.6% on PACS, 79.1% on VLCS, 72.5% on OfficeHome, 52.3% on TerraIncognita, and 47.4% on DomainNet, an average of 68.0%, beating ERM (w/o model averaging) by ∼ 4%. We also evaluate a model that is pre-trained on a larger dataset, where we show EoA achieves an average accuracy of 72.7%, beating its corresponding ERM baseline by 5%.


page 1

page 2

page 3

page 4


An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

In the realm of out-of-distribution generalization tasks, finetuning has...

Domain Generalization using Ensemble Learning

Domain generalization is a sub-field of transfer learning that aims at b...

NormAUG: Normalization-guided Augmentation for Domain Generalization

Deep learning has made significant advancements in supervised learning. ...

Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution

The distribution shifts between training and test data typically undermi...

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

The deep learning community has proposed optimizations spanning hardware...

Bag of Tricks for Out-of-Distribution Generalization

Recently, out-of-distribution (OOD) generalization has attracted attenti...

The Effect of Model Size on Worst-Group Generalization

Overparameterization is shown to result in poor test accuracy on rare su...

Please sign up or login with your details

Forgot password? Click here to reset