Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

05/12/2018
by   Yining Wang, et al.
0

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(√(T T)), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2011

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of s...
research
10/11/2019

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

We propose RandUCB, a bandit strategy that uses theoretically derived co...
research
03/25/2021

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

We consider a sequential assortment selection problem where the user cho...
research
06/26/2020

On Regret with Multiple Best Arms

We study regret minimization problem with the existence of multiple best...
research
01/29/2020

Functional Sequential Treatment Allocation with Covariates

We consider a multi-armed bandit problem with covariates. Given a realiz...
research
05/04/2018

Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback

The web link selection problem is to select a small subset of web links ...
research
08/09/2021

Whittle Index for A Class of Restless Bandits with Imperfect Observations

We consider a class of restless bandit problems that finds a broad appli...

Please sign up or login with your details

Forgot password? Click here to reset