Optimal Clustering with Bandit Feedback

by   Junwen Yang, et al.

This paper considers the problem of online clustering with bandit feedback. A set of arms (or items) can be partitioned into various groups that are unknown. Within each group, the observations associated to each of the arms follow the same distribution with the same mean vector. At each time step, the agent queries or pulls an arm and obtains an independent observation from the distribution it is associated to. Subsequent pulls depend on previous ones as well as the previously obtained samples. The agent's task is to uncover the underlying partition of the arms with the least number of arm pulls and with a probability of error not exceeding a prescribed constant δ. The problem proposed finds numerous applications from clustering of variants of viruses to online market segmentation. We present an instance-dependent information-theoretic lower bound on the expected sample complexity for this task, and design a computationally efficient and asymptotically optimal algorithm, namely Bandit Online Clustering (BOC). The algorithm includes a novel stopping rule for adaptive sequential testing that circumvents the need to exactly solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower bound asymptotically, and significantly outperforms a non-adaptive baseline algorithm.


page 1

page 2

page 3

page 4


Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection

In the Best-k-Arm problem, we are given n stochastic bandit arms, each a...

A/B/n Testing with Control in the Presence of Subpopulations

Motivated by A/B/n testing applications, we consider a finite set of dis...

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study the K-armed dueling bandit problem, a variation of the standard...

A unified framework for bandit multiple testing

In bandit multiple hypothesis testing, each arm corresponds to a differe...

An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule

We present a provably optimal differentially private algorithm for the s...

The Max K-Armed Bandit: A PAC Lower Bound and tighter Algorithms

We consider the Max K-Armed Bandit problem, where a learning agent is fa...

Adaptive Estimation of Random Vectors with Bandit Feedback

We consider the problem of sequentially learning to estimate, in the mea...

Please sign up or login with your details

Forgot password? Click here to reset