A Contraction Approach to Model-based Reinforcement Learning
Model-based Reinforcement Learning has shown considerable experimental success. However, a theoretical understanding of it is still lacking. To this end, we analyze the error in cumulative reward for both stochastic and deterministic transitions using a contraction approach. We show that this approach doesn't require strong assumptions and can recover the typical quadratic error to the horizon. We prove that branched rollouts can reduce this error and are essential for deterministic transitions to have a Bellman contraction. Our results also apply to Imitation Learning, where we prove that GAN-type learning is better than Behavioral Cloning in continuous state and action spaces.
READ FULL TEXT