Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

by   Amir M. Soufi Enayati, et al.

Reinforcement learning demonstrates significant potential in automatically building control policies in numerous domains, but shows low efficiency when applied to robot manipulation tasks due to the curse of dimensionality. To facilitate the learning of such tasks, prior knowledge or heuristics that incorporate inherent simplification can effectively improve the learning performance. This paper aims to define and incorporate the natural symmetry present in physical robotic environments. Then, sample-efficient policies are trained by exploiting the expert demonstrations in symmetrical environments through an amalgamation of reinforcement and behavior cloning, which gives the off-policy learning process a diverse yet compact initiation. Furthermore, it presents a rigorous framework for a recent concept and explores its scope for robot manipulation tasks. The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle, in a simulation experiment study. A PID controller, which tracks the linear joint-space trajectories with hard-coded temporal logic to produce interim midpoints, is used to generate demonstrations in the study. The results of the study present the effect of the number of demonstrations and quantify the magnitude of behavior cloning to exemplify the possible improvement of model-free reinforcement learning in common manipulation tasks. A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.


page 1

page 6

page 9


Automata Guided Reinforcement Learning With Demonstrations

Tasks with complex temporal structures and long horizons pose a challeng...

Constrained-Space Optimization and Reinforcement Learning for Complex Tasks

Learning from Demonstration is increasingly used for transferring operat...

Action Priors for Large Action Spaces in Robotics

In robotics, it is often not possible to learn useful policies using pur...

On-Robot Policy Learning with O(2)-Equivariant SAC

Recently, equivariant neural network models have been shown to be useful...

Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation

Despite the success of reinforcement learning methods, they have yet to ...

SQUIRL: Robust and Efficient Learning from Video Demonstration of Long-Horizon Robotic Manipulation Tasks

Recent advances in deep reinforcement learning (RL) have demonstrated it...

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Imitation learning, followed by reinforcement learning algorithms, is a ...

Please sign up or login with your details

Forgot password? Click here to reset