Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandit
We consider the problem of finding, through adaptive sampling, which of n populations (arms) has the largest mean. Our objective is to determine a rule which identifies the best population with a fixed minimum confidence using as few observations as possible, i.e. fixed-confidence (FC) best arm identification (BAI) in multi-armed bandits. We study such problems under the Bayesian setting with both Bernoulli and Gaussian populations. We propose to use the classical vector at a time (VT) rule, which samples each alive population once in each round. We show how VT can be implemented and analyzed in our Bayesian setting and be improved by early elimination. We also propose and analyze a variant of the classical play the winner (PW) algorithm. Numerical results show that these rules compare favorably with state-of-art algorithms.
READ FULL TEXT