Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

10/06/2021
by   Ting-Han Fan, et al.
0

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing Recent Experience sampling and 1/age weighted sampling. Both sampling strategies yield smaller bias and variance and are hence preferable to uniform sampling.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset