Choquet regularization for reinforcement learning
We propose Choquet regularizers to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton–Jacobi–Bellman equation of the problem, and solve it explicitly in the linear–quadratic (LQ) case via maximizing statically a mean–variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as ϵ-greedy, exponential, uniform and Gaussian.
READ FULL TEXT