Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits

06/26/2023
by   Yuwei Luo, et al.
0

This paper is motivated by recent developments in the linear bandit literature, which have revealed a discrepancy between the promising empirical performance of algorithms such as Thompson sampling and Greedy, when compared to their pessimistic theoretical regret bounds. The challenge arises from the fact that while these algorithms may perform poorly in certain problem instances, they generally excel in typical instances. To address this, we propose a new data-driven technique that tracks the geometry of the uncertainty ellipsoid, enabling us to establish an instance-dependent frequentist regret bound for a broad class of algorithms, including Greedy, OFUL, and Thompson sampling. This result empowers us to identify and “course-correct" instances in which the base algorithms perform poorly. The course-corrected algorithms achieve the minimax optimal regret of order 𝒪̃(d√(T)), while retaining most of the desirable properties of the base algorithms. We present simulation results to validate our findings and compare the performance of our algorithms with the baselines.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset