Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

09/17/2020
by   Baihan Lin, et al.
6

We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Online Semi-Supervised Learning with Bandit Feedback

We formulate a new problem at the intersectionof semi-supervised learnin...
research
06/08/2020

Speaker Diarization as a Fully Online Learning Problem in MiniVox

We proposed a novel AI framework to conduct real-time multi-speaker diar...
research
02/03/2018

Adaptive Representation Selection in Contextual Bandit with Unlabeled History

We consider an extension of the contextual bandit setting, motivated by ...
research
03/18/2020

Self-Supervised Contextual Bandits in Computer Vision

Contextual bandits are a common problem faced by machine learning practi...
research
12/12/2020

Semi-supervised reward learning for offline reinforcement learning

In offline reinforcement learning (RL) agents are trained using a logged...
research
04/28/2020

A Linear Bandit for Seasonal Environments

Contextual bandit algorithms are extremely popular and widely used in re...
research
02/20/2015

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner...

Please sign up or login with your details

Forgot password? Click here to reset