Reinforcement Learning from Demonstrations by Novel Interactive Expert and Application to Automatic Berthing Control Systems for Unmanned Surface Vessel

by   Haoran Zhang, et al.

In this paper, two novel practical methods of Reinforcement Learning from Demonstration (RLfD) are developed and applied to automatic berthing control systems for Unmanned Surface Vessel. A new expert data generation method, called Model Predictive Based Expert (MPBE) which combines Model Predictive Control and Deep Deterministic Policy Gradient, is developed to provide high quality supervision data for RLfD algorithms. A straightforward RLfD method, model predictive Deep Deterministic Policy Gradient (MP-DDPG), is firstly introduced by replacing the RL agent with MPBE to directly interact with the environment. Then distribution mismatch problem is analyzed for MP-DDPG, and two techniques that alleviate distribution mismatch are proposed. Furthermore, another novel RLfD algorithm based on the MP-DDPG, called Self-Guided Actor-Critic (SGAC) is present, which can effectively leverage MPBE by continuously querying it to generate high quality expert data online. The distribution mismatch problem leading to unstable learning process is addressed by SGAC in a DAgger manner. In addition, theoretical analysis is given to prove that SGAC algorithm can converge with guaranteed monotonic improvement. Simulation results verify the effectiveness of MP-DDPG and SGAC to accomplish the ship berthing control task, and show advantages of SGAC comparing with other typical reinforcement learning algorithms and MP-DDPG.


page 1

page 2

page 3

page 4


Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Pretraining with expert demonstrations have been found useful in speedin...

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimal...

Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift

Off-policy deep reinforcement learning (RL) algorithms are incapable of ...

Smart Train Operation Algorithms based on Expert Knowledge and Reinforcement Learning

During recent decades, the automatic train operation (ATO) system has be...

Pretrain Soft Q-Learning with Imperfect Demonstrations

Pretraining reinforcement learning methods with demonstrations has been ...

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Deep reinforcement learning includes a broad family of algorithms that p...

Rebalancing Dockless Bike Sharing Systems

Bike sharing provides an environment-friendly way for traveling and is b...

Please sign up or login with your details

Forgot password? Click here to reset