Meta-descent for Online, Continual Prediction

07/17/2019
by   Andrew Jacobsen, et al.
0

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update---a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the step-size parameters to minimize prediction error. These meta-descent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.

READ FULL TEXT
research
04/10/2018

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

In this paper, we introduce a method for adapting the step-sizes of temp...
research
07/23/2020

Online Robust and Adaptive Learning from Data Streams

In online learning from non-stationary data streams, it is both necessar...
research
05/10/2018

Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Reinforcement learning (RL) has had many successes in both "deep" and "s...
research
03/08/2019

Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

There is a long history of using meta learning as representation learnin...
research
03/04/2020

Adaptation in Online Social Learning

This work studies social learning under non-stationary conditions. Altho...
research
07/16/2019

SGD momentum optimizer with step estimation by online parabola model

In stochastic gradient descent, especially for neural network training, ...
research
01/16/2013

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Recent work has established an empirically successful framework for adap...

Please sign up or login with your details

Forgot password? Click here to reset