Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

06/02/2022

∙

In this work, we propose a novel Kernelized Stein Discrepancy-based Posterior Sampling for RL algorithm (named ) which extends model-based RL based upon posterior sampling (PSRL) in several ways: we (i) relax the need for any smoothness or Gaussian assumptions, allowing for complex mixture models; (ii) ensure it is applicable to large-scale training by incorporating a compression step such that the posterior consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) develop a novel regret analysis of PSRL based upon integral probability metrics, which, under a smoothness condition on the constructed posterior, can be evaluated in closed form as the kernelized Stein discrepancy (KSD). Consequently, we are able to improve the 𝒪(H^3/2d√(T)) regret of PSRL to 𝒪(H^3/2√(T)), where d is the input dimension, H is the episode length, and T is the total number of episodes experienced, alleviating a linear dependence on d . Moreover, we theoretically establish a trade-off between regret rate with posterior representational complexity via introducing a compression budget parameter ϵ based on KSD, and establish a lower bound on the required complexity for consistency of the model. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, with substantive improvements in computation time. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, and can achieve up-to 50% reduction in wall clock time in some continuous control environments.

READ FULL TEXT

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Sign in with Google

Consider DeepAI Pro