Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
In this work, we propose a novel Kernelized Stein Discrepancy-based Posterior Sampling for RL algorithm (named ) which extends model-based RL based upon posterior sampling (PSRL) in several ways: we (i) relax the need for any smoothness or Gaussian assumptions, allowing for complex mixture models; (ii) ensure it is applicable to large-scale training by incorporating a compression step such that the posterior consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) develop a novel regret analysis of PSRL based upon integral probability metrics, which, under a smoothness condition on the constructed posterior, can be evaluated in closed form as the kernelized Stein discrepancy (KSD). Consequently, we are able to improve the šŖ(H^3/2dā(T)) regret of PSRL to šŖ(H^3/2ā(T)), where d is the input dimension, H is the episode length, and T is the total number of episodes experienced, alleviating a linear dependence on d . Moreover, we theoretically establish a trade-off between regret rate with posterior representational complexity via introducing a compression budget parameter Ļµ based on KSD, and establish a lower bound on the required complexity for consistency of the model. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, with substantive improvements in computation time. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, and can achieve up-to 50% reduction in wall clock time in some continuous control environments.
READ FULL TEXT