Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

06/12/2023

∙

Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where the safety objectives are modeled by constraint functions. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a cancellation of errors; that is, one can compensate for a constraint violation in one episode with a strict constraint satisfaction in another episode. However, in practical applications, we do not consider such a behavior safe. In this paper, we overcome this weakness by proposing a novel model-based dual algorithm OptAug-CMDP for tabular finite-horizon CMDPs. Our algorithm is motivated by the augmented Lagrangian method and can be performed efficiently. We show that during K episodes of exploring the CMDP, our algorithm obtains a regret of Õ(√(K)) for both the objective and the constraint violation. Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.

READ FULL TEXT

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

Sign in with Google

Consider DeepAI Pro