Variance Reduction in SGD by Distributed Importance Sampling

11/20/2015
by   Guillaume Alain, et al.
0

Humans are able to accelerate their learning by selecting training materials that are the most informative and at the appropriate level of difficulty. We propose a framework for distributing deep learning in which one set of workers search for the most informative examples in parallel while a single worker updates the model on examples selected by importance sampling. This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient. We show experimentally that this method reduces gradient variance even in a context where the cost of synchronization across machines cannot be ignored, and where the factors for importance sampling are not updated instantly across the training set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2018

Not All Samples Are Created Equal: Deep Learning with Importance Sampling

Deep neural network training spends most of the computation on examples ...
research
07/16/2022

Adaptive Sketches for Robust Regression with Importance Sampling

We introduce data structures for solving robust regression through stoch...
research
02/06/2016

Importance Sampling for Minibatches

Minibatching is a very well studied and highly popular technique in supe...
research
08/12/2020

Variance-reduced Language Pretraining via a Mask Proposal Network

Self-supervised learning, a.k.a., pretraining, is important in natural l...
research
10/20/2019

From Importance Sampling to Doubly Robust Policy Gradient

We show that policy gradient (PG) and its variance reduction variants ca...
research
02/28/2023

PA DA: Jointly Sampling PAth and DAta for Consistent NAS

Based on the weight-sharing mechanism, one-shot NAS methods train a supe...
research
09/13/2021

Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

For machine learning models trained with limited labeled training data, ...

Please sign up or login with your details

Forgot password? Click here to reset