Multi-objective Contextual Bandit Problem with Similarity Information

03/11/2018
by   Eralp Turğay, et al.
0

In this paper we propose the multi-objective contextual bandit problem with similarity information. This problem extends the classical contextual bandit problem with similarity information by introducing multiple and possibly conflicting objectives. Since the best arm in each objective can be different given the context, learning the best arm based on a single objective can jeopardize the rewards obtained from the other objectives. In order to evaluate the performance of the learner in this setup, we use a performance metric called the contextual Pareto regret. Essentially, the contextual Pareto regret is the sum of the distances of the arms chosen by the learner to the context dependent Pareto front. For this problem, we develop a new online learning algorithm called Pareto Contextual Zooming (PCZ), which exploits the idea of contextual zooming to learn the arms that are close to the Pareto front for each observed context by adaptively partitioning the joint context-arm set according to the observed rewards and locations of the context-arm pairs selected in the past. Then, we prove that PCZ achieves Õ (T^(1+d_p)/(2+d_p)) Pareto regret where d_p is the Pareto zooming dimension that depends on the size of the set of near-optimal context-arm pairs. Moreover, we show that this regret bound is nearly optimal by providing an almost matching Ω (T^(1+d_p)/(2+d_p)) lower bound.

READ FULL TEXT
research
05/30/2019

Multi-Objective Generalized Linear Bandits

In this paper, we study the multi-objective bandits (MOB) problem, where...
research
07/26/2019

Lexicographic Multiarmed Bandit

We consider a multiobjective multiarmed bandit problem with lexicographi...
research
07/01/2019

Exploiting Relevance for Online Decision-Making in High-Dimensions

Many sequential decision-making tasks require choosing at each decision ...
research
03/11/2018

Combinatorial Multi-Objective Multi-Armed Bandit Problem

In this paper, we introduce the COmbinatorial Multi-Objective Multi-Arme...
research
02/01/2021

Doubly Robust Thompson Sampling for linear payoffs

A challenging aspect of the bandit problem is that a stochastic reward i...
research
02/04/2022

Tsetlin Machine for Solving Contextual Bandit Problems

This paper introduces an interpretable contextual bandit algorithm using...
research
03/05/2018

Costs and Rewards in Priced Timed Automata

We consider Pareto analysis of reachable states of multi-priced timed au...

Please sign up or login with your details

Forgot password? Click here to reset