Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training

by   Abraham J. Fetterman, et al.

Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a practical method for robustly tuning large models, we present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian optimization algorithm that performs local search around the performance-cost Pareto frontier. CARBS does well even in unbounded search spaces with many hyperparameters, learns scaling relationships so that it can tune models even as they are scaled up, and automates much of the "black magic" of tuning. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline (PPO, as provided in the original ProcGen paper). We also reproduce the model size vs. training tokens scaling result from the Chinchilla project (Hoffmann et al. 2022), while simultaneously discovering scaling laws for every other hyperparameter, via an easy automated process that uses significantly less compute and is applicable to any deep learning problem (not just language models).


page 1

page 2

page 3

page 4


Fast Hyperparameter Tuning using Bayesian Optimization with Directional Derivatives

In this paper we develop a Bayesian optimization based hyperparameter tu...

Pareto-efficient Acquisition Functions for Cost-Aware Bayesian Optimization

Bayesian optimization (BO) is a popular method to optimize expensive bla...

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

As language models scale up, it becomes increasingly expensive to verify...

PyTorch Hyperparameter Tuning - A Tutorial for spotPython

The goal of hyperparameter tuning (or hyperparameter optimization) is to...

FLO: Fast and Lightweight Hyperparameter Optimization for AutoML

Integrating ML models in software is of growing interest. Building accur...

Learning to Tune XGBoost with XGBoost

In this short paper we investigate whether meta-learning techniques can ...

Tune: A Research Platform for Distributed Model Selection and Training

Modern machine learning algorithms are increasingly computationally dema...

Please sign up or login with your details

Forgot password? Click here to reset