Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

08/24/2023
by   Hanchi Huang, et al.
0

We propose a novel master-slave architecture to solve the top-K combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between exploration and exploitation. Thanks to the elaborate design of slave models, the co-training mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at: <https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits>.

READ FULL TEXT

page 1

page 13

research
02/11/2015

Combinatorial Bandits Revisited

This paper investigates stochastic and adversarial combinatorial multi-a...
research
09/16/2020

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed ...
research
12/19/2016

Corralling a Band of Bandit Algorithms

We study the problem of combining multiple bandit algorithms (that is, o...
research
01/30/2023

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

We investigate the problem of stochastic, combinatorial multi-armed band...
research
03/29/2022

On Kernelized Multi-Armed Bandits with Constraints

We study a stochastic bandit problem with a general unknown reward funct...
research
05/10/2023

Efficient Training of Multi-task Neural Solver with Multi-armed Bandits

Efficiently training a multi-task neural solver for various combinatoria...
research
10/23/2019

Diversifying Database Activity Monitoring with Bandits

Database activity monitoring (DAM) systems are commonly used by organiza...

Please sign up or login with your details

Forgot password? Click here to reset