A General Framework of Multi-Armed Bandit Processes by Switching Restrictions
This paper proposes a general framework of multi-armed bandit (MAB) processes by introducing a type of restrictions on the switches among arms to the arms evolving in continuous time. The Gittins index process is developed for any single arm subject to the restrictions on stopping times and then the optimality of the corresponding Gittins index rule is established. The Gittins indices defined in this paper are consistent with the ones for MAB processes in continuous time, discrete time, and semi-Markovian setting so that the new theory covers the classical models as special cases and also applies to many other situations that have not yet been touched in the literature. While the proof of the optimality of Gittins index policies benefits from ideas in the existing theory of MAB processes in continuous time, new techniques are introduced which drastically simplifies the proof.
READ FULL TEXT