Automatic prior selection for meta Bayesian optimization with a case study on tuning deep neural network optimizers

by   Zi Wang, et al.

The performance of deep neural networks can be highly sensitive to the choice of a variety of meta-parameters, such as optimizer parameters and model hyperparameters. Tuning these well, however, often requires extensive and costly experimentation. Bayesian optimization (BO) is a principled approach to solve such expensive hyperparameter tuning problems efficiently. Key to the performance of BO is specifying and refining a distribution over functions, which is used to reason about the optima of the underlying function being optimized. In this work, we consider the scenario where we have data from similar functions that allows us to specify a tighter distribution a priori. Specifically, we focus on the common but potentially costly task of tuning optimizer parameters for training neural networks. Building on the meta BO method from Wang et al. (2018), we develop practical improvements that (a) boost its performance by leveraging tuning results on multiple tasks without requiring observations for the same meta-parameter points across all tasks, and (b) retain its regret bound for a special case of our method. As a result, we provide a coherent BO solution for iterative optimization of continuous optimizer parameters. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.


page 1

page 2

page 3

page 4


Pre-training helps Bayesian optimization too

Bayesian optimization (BO) has become a popular strategy for global opti...

Rich Feature Construction for the Optimization-Generalization Dilemma

There often is a dilemma between ease of optimization and robust out-of-...

Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

Bayesian optimization is popular for optimizing time-consuming black-box...

Hyperparameter Transfer Learning with Adaptive Complexity

Bayesian optimization (BO) is a sample efficient approach to automatical...

Unbounded Bayesian Optimization via Regularization

Bayesian optimization has recently emerged as a popular and efficient to...

Towards Learning Universal Hyperparameter Optimizers with Transformers

Meta-learning hyperparameter optimization (HPO) algorithms from prior ex...

Weighted Sampling for Combined Model Selection and Hyperparameter Tuning

The combined algorithm selection and hyperparameter tuning (CASH) proble...

Please sign up or login with your details

Forgot password? Click here to reset