Asymptotically Optimal Multi-Armed Bandit Activation Policies under Side Constraints
This paper introduces the first asymptotically optimal strategy for the multi armed bandit (MAB) problem under side constraints. The side constraints model situations in which bandit activations are not cost free, but incur known bandit dependent costs (utilize different resources), and the controller is always constrained by a limited resource availability. The main result involves the derivation of an asymptotic lower bound for the regret of feasible uniformly fast policies and the construction of policies that achieve this lower bound, under pertinent conditions. Further, we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances and for the case of arbitrary discrete distributions with finite support.
READ FULL TEXT