Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation
We study regret minimization problems in a two-sided matching market where uniformly valued demand side agents (a.k.a. agents) continuously compete for getting matched with supply side agents (a.k.a. arms) with unknown and heterogeneous valuations. Such markets abstract online matching platforms (for e.g. UpWork, TaskRabbit) and falls within the purview of matching bandit models introduced in Liu et al. <cit.>. The uniform valuation in the demand side admits a unique stable matching equilibrium in the system. We design the first decentralized algorithm - (), for matching bandits under uniform valuation that does not require any knowledge of reward gaps or time horizon, and thus partially resolves an open question in <cit.>. works in phases of exponentially increasing length. In each phase i, an agent first deletes dominated arms – the arms preferred by agents ranked higher than itself. Deletion follows dynamic explore-exploit using UCB algorithm on the remaining arms for 2^i rounds. Finally, the preferred arm is broadcast in a decentralized fashion to other agents through pure exploitation in (N-1)K rounds with N agents and K arms. Comparing the obtained reward with respect to the unique stable matching, we show that achieves O(log(T)/Δ^2) regret in T rounds, where Δ is the minimum gap across all agents and arms. We provide a (orderwise) matching regret lower-bound.
READ FULL TEXT