Kernel-based methods for bandit convex optimization

07/11/2016
by   Sébastien Bubeck, et al.
0

We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √(T)-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(n^9.5√(T))-regret, and we show that a simple variant of this algorithm can be run in poly(n (T))-time per step at the cost of an additional poly(n) T^o(1) factor in the regret. These results improve upon the Õ(n^11√(T))-regret and (poly(T))-time result of the first two authors, and the (T)^poly(n)√(T)-regret and (T)^poly(n)-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ(n^1.5√(T))-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω(n √(T)) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n^3 / ϵ^2.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset