Online Newton Step Algorithm with Estimated Gradient

by   Binbin Liu, et al.

Online learning with limited information feedback (bandit) tries to solve the problem where an online learner receives partial feedback information from the environment in the course of learning. Under this setting, Flaxman extends Zinkevich's classical Online Gradient Descent (OGD) algorithm Zinkevich [2003] by proposing the Online Gradient Descent with Expected Gradient (OGDEG) algorithm. Specifically, it uses a simple trick to approximate the gradient of the loss function f_t by evaluating it at a single point and bounds the expected regret as O(T^5/6) Flaxman et al. [2005]. It has been shown that compared with the first-order algorithms, second-order online learning algorithms such as Online Newton Step (ONS) Hazan et al. [2007] can significantly accelerate the convergence rate in traditional online learning. Motivated by this, this paper aims to exploit second-order information to speed up the convergence of OGDEG. In particular, we extend the ONS algorithm with the trick of expected gradient and develop a novel second-order online learning algorithm, i.e., Online Newton Step with Expected Gradient (ONSEG). Theoretically, we show that the proposed ONSEG algorithm significantly reduces the expected regret of OGDEG from O(T^5/6) to O(T^2/3) in the bandit feedback scenario. Empirically, we demonstrate the advantages of the proposed algorithm on several real-world datasets.


page 1

page 2

page 3

page 4


Adversarial Online Learning with noise

We present and study models of adversarial online learning where the fee...

Modified online Newton step based on element wise multiplication

The second order method as Newton Step is a suitable technique in Online...

Online Learning Under A Separable Stochastic Approximation Framework

We propose an online learning algorithm for a class of machine learning ...

Predictor-Corrector Policy Optimization

We present a predictor-corrector framework, called PicCoLO, that can tra...

Learning from Censored and Dependent Data: The case of Linear Dynamics

Observations from dynamical systems often exhibit irregularities, such a...

The Many Faces of Exponential Weights in Online Learning

A standard introduction to online learning might place Online Gradient D...

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution

Supply chain management (SCM) has been recognized as an important discip...

Please sign up or login with your details

Forgot password? Click here to reset