Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

06/12/2023
by   Adrian Müller, et al.
0

Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where the safety objectives are modeled by constraint functions. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a cancellation of errors; that is, one can compensate for a constraint violation in one episode with a strict constraint satisfaction in another episode. However, in practical applications, we do not consider such a behavior safe. In this paper, we overcome this weakness by proposing a novel model-based dual algorithm OptAug-CMDP for tabular finite-horizon CMDPs. Our algorithm is motivated by the augmented Lagrangian method and can be performed efficiently. We show that during K episodes of exploring the CMDP, our algorithm obtains a regret of Õ(√(K)) for both the objective and the constraint violation. Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2023

Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation

Constrained Markov decision processes (CMDPs) model scenarios of sequent...
research
12/28/2021

Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

We consider large-scale Markov decision processes with an unknown cost f...
research
05/31/2023

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

We examine online safe multi-agent reinforcement learning using constrai...
research
08/03/2009

Regret Bounds for Opportunistic Channel Access

We consider the task of opportunistic channel access in a primary system...
research
06/13/2023

Provably Learning Nash Policies in Constrained Markov Potential Games

Multi-agent reinforcement learning (MARL) addresses sequential decision-...
research
05/07/2020

A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

The canonical solution methodology for finite constrained Markov decisio...

Please sign up or login with your details

Forgot password? Click here to reset