Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

11/15/2022
by   Hiroki Naganuma, et al.
0

Modern deep learning systems are fragile and do not generalize well under distribution shifts. While much promising work has been accomplished to address these concerns, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address the problem settings for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as out-of-distribution datasets for the exhaustive study. We search over a wide range of hyperparameters and examine the classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings: i) contrary to conventional wisdom, adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum-based SGD), ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset - linear returns, increasing returns, and diminishing returns. We believe these findings can help practitioners choose the right optimizer and know what behavior to expect.

READ FULL TEXT

page 6

page 29

page 33

page 41

research
06/07/2021

OoD-Bench: Benchmarking and Understanding Out-of-Distribution Generalization Datasets and Algorithms

Deep learning has achieved tremendous success with independent and ident...
research
12/04/2019

Domain-independent Dominance of Adaptive Methods

From a simplified analysis of adaptive methods, we derive AvaGrad, a new...
research
10/30/2020

Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

Recently, invariant risk minimization (IRM) was proposed as a promising ...
research
11/11/2018

Generalization Bounds for Vicinal Risk Minimization Principle

The vicinal risk minimization (VRM) principle, first proposed by vapnik1...
research
07/20/2023

Flatness-Aware Minimization for Domain Generalization

Domain generalization (DG) seeks to learn robust models that generalize ...
research
10/20/2022

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization

Machine learning systems are often applied to data that is drawn from a ...

Please sign up or login with your details

Forgot password? Click here to reset