On the complexity of All ε-Best Arms Identification

02/13/2022
by   Aymen Al Marjani, et al.
0

We consider the problem introduced by <cit.> of identifying all the ε-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. In the fixed confidence setting, we give a lower bound on the number of samples required by any algorithm that returns the set of ε-good arms with a failure probability less than some risk level δ. This bound writes as T_ε^*(μ)log(1/δ), where T_ε^*(μ) is a characteristic time that depends on the vector of mean rewards μ and the accuracy parameter ε. We also provide an efficient numerical method to solve the convex max-min program that defines the characteristic time. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by <cit.>. Using this method, we propose a Track-and-Stop algorithm that identifies the set of ε-good arms w.h.p and enjoys asymptotic optimality (when δ goes to zero) in terms of the expected sample complexity. Finally, using numerical simulations, we demonstrate our algorithm's advantage over state-of-the-art methods, even for moderate values of the risk parameter.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro