Online Learning in Adversarial MDPs: Is the Communicating Case Harder than Ergodic?

11/03/2021
by   Gautam Chandrasekaran, et al.
0

We study online learning in adversarial communicating Markov Decision Processes with full information. We give an algorithm that achieves a regret of O(√(T)) with respect to the best fixed deterministic policy in hindsight when the transitions are deterministic. We also prove a regret lower bound in this setting which is tight up to polynomial factors in the MDP parameters. We also give an inefficient algorithm that achieves O(√(T)) regret in communicating MDPs (with an additional mild restriction on the transition dynamics).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2019

Online Convex Optimization in Adversarial Markov Decision Processes

We consider online learning in episodic loop-free Markov decision proces...
research
06/26/2014

Online learning in MDPs with side information

We study online learning of finite Markov decision process (MDP) problem...
research
01/31/2022

Cooperative Online Learning in Stochastic and Adversarial MDPs

We study cooperative online learning in stochastic and adversarial Marko...
research
06/27/2021

Regret Analysis in Deterministic Reinforcement Learning

We consider Markov Decision Processes (MDPs) with deterministic transiti...
research
05/27/2023

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Existing online learning algorithms for adversarial Markov Decision Proc...
research
12/06/2021

Lecture Notes on Partially Known MDPs

In these notes we will tackle the problem of finding optimal policies fo...
research
01/23/2019

Learning to Collaborate in Markov Decision Processes

We consider a two-agent MDP framework where agents repeatedly solve a ta...

Please sign up or login with your details

Forgot password? Click here to reset