Top-Two Thompson Sampling for Contextual Top-mc Selection Problems
We aim to efficiently allocate a fixed simulation budget to identify the top-mc designs for each context among a finite number of contexts. The performance of each design under a context is measured by an identifiable statistical characteristic, possibly with the existence of nuisance parameters. Under a Bayesian framework, we extend the top-two Thompson sampling method designed for selecting the best design in a single context to the contextual top-mc selection problems, leading to an efficient sampling policy that simultaneously allocates simulation samples to both contexts and designs. To demonstrate the asymptotic optimality of the proposed sampling policy, we characterize the exponential convergence rate of the posterior distribution for a wide range of identifiable sampling distribution families. The proposed sampling policy is proved to be consistent, and asymptotically satisfies a necessary condition for optimality. In particular, when selecting contextual best designs (i.e., mc = 1), the proposed sampling policy is proved to be asymptotically optimal. Numerical experiments demonstrate the good finite sample performance of the proposed sampling policy.
READ FULL TEXT