Practical Evaluation and Optimization of Contextual Bandit Algorithms

02/12/2018

∙

We study and empirically optimize contextual bandit learning, exploration, and problem encodings across 500+ datasets, creating a reference for practitioners and discovering or reinforcing a number of natural open problems for researchers. Across these experiments we show that minimizing the amount of exploration is a key design goal for practical performance. Remarkably, many problems can be solved purely via the implicit exploration imposed by the diversity of contexts. For practitioners, we introduce a number of practical improvements to common exploration algorithms including Bootstrap Thompson sampling, Online Cover, and ϵ-greedy. We also detail a new form of reduction to regression for learning from exploration data. Overall, this is a thorough study and review of contextual bandit methodology.

READ FULL TEXT

Practical Evaluation and Optimization of Contextual Bandit Algorithms

Sign in with Google

Consider DeepAI Pro