An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

02/20/2017
by   Yevgeny Seldin, et al.
0

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from ( t)^3 to ( t)^2 and eliminates an additive factor of order Δ e^1/Δ^2, where Δ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro