Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

07/06/2023
by   Yu Chen, et al.
0

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an O(√(α^-(H+1)(d^2H^4+dH^6)K)) regret, where α is the risk level, d is the dimension of state-action features, H is the length of each episode, and K is the number of episodes. We also establish a matching lower bound Ω(√(α^-(H-1)d^2K)) to validate the optimality of ICVaR-L with respect to d and K. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an O(√(α^-(H+1)DH^4K)) regret, where D is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro