Contextual Markov Decision Processes using Generalized Linear Models

03/14/2019
by   Aditya Modi, et al.
6

We consider the recently proposed reinforcement learning (RL) framework of Contextual Markov Decision Processes (CMDP), where the agent has a sequence of episodic interactions with tabular environments chosen from a possibly infinite set. The parameters of these environments depend on a context vector that is available to the agent at the start of each episode. In this paper, we propose a no-regret online RL algorithm in the setting where the MDP parameters are obtained from the context using generalized linear models (GLMs). The proposed algorithm GL-ORL relies on efficient online updates and is also memory efficient. Our analysis of the algorithm gives new results in the logit link case and improves previous bounds in the linear case. Our algorithm uses efficient Online Newton Step updates to build confidence sets. Moreover, for any strongly convex link function, we also show a generic conversion from any online no-regret algorithm to confidence sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Markov Decision Processes with Continuous Side Information

We consider a reinforcement learning (RL) setting in which the agent int...
research
01/30/2023

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

The optimized certainty equivalent (OCE) is a family of risk measures th...
research
02/16/2018

Reactive Reinforcement Learning in Asynchronous Environments

The relationship between a reinforcement learning (RL) agent and an asyn...
research
02/04/2023

Reinforcement Learning with History-Dependent Dynamic Contexts

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a no...
research
06/02/2022

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Incrementality, which is used to measure the causal effect of showing an...
research
10/26/2020

Expert Selection in High-Dimensional Markov Decision Processes

In this work we present a multi-armed bandit framework for online expert...

Please sign up or login with your details

Forgot password? Click here to reset