Reinforcement Learning of Markov Decision Processes with Peak Constraints

01/23/2019
by   Ather Gattami, et al.
0

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take actions based on the observed states, reward outputs, and constraint-outputs, without any knowledge about the dynamics, reward functions, and/or the knowledge of the constraint-functions. We introduce a game theoretic approach to construct reinforcement learning algorithms where the agent maximizes an unconstrained objective that depends on the simulated action of the minimizing opponent which acts on a finite set of actions and the output data of the constraint functions (rewards). We show that the policies obtained from maximin Q-learning converge to the optimal policies. To the best of our knowledge, this is the first time learning algorithms guarantee convergence to optimal stationary policies for the MDP problem with peak constraints for both discounted and expected average rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2021

Markov Decision Processes with Long-Term Average Constraints

We consider the problem of constrained Markov Decision Process (CMDP) wh...
research
02/27/2020

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes (MD...
research
10/02/2019

Formal Language Constraints for Markov Decision Processes

In order to satisfy safety conditions, a reinforcement learned (RL) agen...
research
01/14/2018

Deep Reinforcement Fuzzing

Fuzzing is the process of finding security vulnerabilities in input-proc...
research
12/09/2021

Reinforcement Learning with Almost Sure Constraints

In this work we address the problem of finding feasible policies for Con...
research
01/02/2022

Reinforcement Learning for Task Specifications with Action-Constraints

In this paper, we use concepts from supervisory control theory of discre...
research
05/15/2019

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

We consider an agent who is involved in a Markov decision process and re...

Please sign up or login with your details

Forgot password? Click here to reset