On the Tunability of Optimizers in Deep Learning

by   Prabhu Teja Sivaprasad, et al.

There is no consensus yet on the question whether adaptive gradient methods like Adam are easier to use than non-adaptive optimization methods like SGD. In this work, we fill in the important, yet ambiguous concept of `ease-of-use' by defining an optimizer's tunability: How easy is it to find good hyperparameter configurations using automatic random hyperparameter search? We propose a practical and universal quantitative measure for optimizer tunability that can form the basis for a fair optimizer benchmark. Evaluating a variety of optimizers on an extensive set of standard datasets and architectures, we find that Adam is the most tunable for the majority of problems, especially with a low budget for hyperparameter tuning.


Is One Hyperparameter Optimizer Enough?

Hyperparameter tuning is the black art of automatically finding a good c...

Adaptive Optimizer for Automated Hyperparameter Optimization Problem

The choices of hyperparameters have critical effects on the performance ...

Descending through a Crowded Valley – Benchmarking Deep Learning Optimizers

Choosing the optimizer is among the most crucial decisions of deep learn...

The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection

Hyperparameter optimization is a ubiquitous challenge in machine learnin...

Squirrel: A Switching Hyperparameter Optimizer

In this short note, we describe our submission to the NeurIPS 2020 BBO c...

Efficient Non-Parametric Optimizer Search for Diverse Tasks

Efficient and automated design of optimizers plays a crucial role in ful...

What can linear interpolation of neural network loss landscapes tell us?

Studying neural network loss landscapes provides insights into the natur...

Code Repositories


HYPAOBT - HYperParameter-Aware Optimizer Benchmarking Protocol

view repo

Please sign up or login with your details

Forgot password? Click here to reset