Smooth Bandit Optimization: Generalization to Hölder Space

12/11/2020
by   Yusha Liu, et al.
0

We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for α-Hölder continuous (including Lipschitz) functions with 0<α≤ 1. Our main result is in generalization of the reward function to Hölder space with exponent α>1 to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Hölder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of Õ(T^d+α/d+2α) for when α>1, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Hölder spaces indexed by α, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the α≤ 1 subset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

Smooth Non-Stationary Bandits

In many applications of online decision making, the environment is non-s...
research
10/19/2021

Batched Lipschitz Bandits

In this paper, we study the batched Lipschitz bandit problem, where the ...
research
04/26/2023

Adaptation to Misspecified Kernel Regularity in Kernelised Bandits

In continuum-armed bandit problems where the underlying function resides...
research
09/05/2019

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

We study a nonparametric contextual bandit problem where the expected re...
research
12/14/2022

Invariant Lipschitz Bandits: A Side Observation Approach

Symmetry arises in many optimization and decision-making problems, and h...
research
05/24/2019

Polynomial Cost of Adaptation for X -Armed Bandits

In the context of stochastic continuum-armed bandits, we present an algo...
research
05/29/2023

Robust Lipschitz Bandits to Adversarial Corruptions

Lipschitz bandit is a variant of stochastic bandits that deals with a co...

Please sign up or login with your details

Forgot password? Click here to reset