A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

07/27/2023
by   Zhihan Xiong, et al.
0

We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set 𝒳⊂ℝ^d, a fixed budget T, and an unpredictable sequence of parameters {θ_t}_t=1^T, an algorithm will aim to correctly identify the best arm x^* := max_x∈𝒳x^⊤∑_t=1^Tθ_t with probability as high as possible. Prior work has addressed the stationary setting where θ_t = θ_1 for all t and demonstrated that the error probability decreases as exp(-T /ρ^*) for a problem-dependent constant ρ^*. But in many real-world A/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over 𝒳 at each time then the error probability decreases as exp(-TΔ^2_(1)/d), where Δ_(1) = min_x ≠ x^* (x^* - x)^⊤1/T∑_t=1^T θ_t. As there exist environments where Δ_(1)^2/ d ≪ 1/ ρ^*, we are motivated to propose a novel algorithm 𝖯1-𝖱𝖠𝖦𝖤 that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of 𝖯1-𝖱𝖠𝖦𝖤 and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro