Online learning with kernel losses
We present a generalization of the adversarial linear bandits framework, where the underlying losses are kernel functions (with an associated reproducing kernel Hilbert space) rather than linear functions. We study a version of the exponential weights algorithm and bound its regret in this setting. Under conditions on the eigendecay of the kernel we provide a sharp characterization of the regret for this algorithm. When we have polynomial eigendecay μ_j <O(j^-β), we find that the regret is bounded by R_n <O(n^β/(2(β-1))); while under the assumption of exponential eigendecay μ_j <O(e^-β j ), we get an even tighter bound on the regret R_n <O(n^1/2(n)^1/2). We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.
READ FULL TEXT