Behavior Constraining in Weight Space for Offline Reinforcement Learning

07/12/2021
by   Phillip Swazinna, et al.
0

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset