Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

by   Abhishek Gupta, et al.

Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications, but in practice the choice of reward function can be crucial for good results – while in principle the reward only needs to specify what the task is, in reality practitioners often need to design more detailed rewards that provide the agent with some hints about how the task should be completed. The idea of this type of “reward-shaping” has been often discussed in the literature, and is often a critical part of practical applications, but there is relatively little formal characterization of how the choice of reward shaping can yield benefits in sample complexity. In this work, we build on the framework of novelty-based exploration to provide a simple scheme for incorporating shaped rewards into RL along with an analysis tool to show that particular choices of reward shaping provably improve sample efficiency. We characterize the class of problems where these gains are expected to be significant and show how this can be connected to practical algorithms in the literature. We confirm that these results hold in practice in an experimental evaluation, providing an insight into the mechanisms through which reward shaping can significantly improve the complexity of reinforcement learning while retaining asymptotic performance.


page 9

page 10


Task-agnostic Exploration in Reinforcement Learning

Efficient exploration is one of the main challenges in reinforcement lea...

On Reward-Free Reinforcement Learning with Linear Function Approximation

Reward-free reinforcement learning (RL) is a framework which is suitable...

Programmatic Reward Design by Example

Reward design is a fundamental problem in reinforcement learning (RL). A...

Towards Theoretical Understanding of Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) denotes a powerful family of algori...

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

A misspecified reward can degrade sample efficiency and induce undesired...

How to Leverage Unlabeled Data in Offline Reinforcement Learning

Offline reinforcement learning (RL) can learn control policies from stat...

Reward Reports for Reinforcement Learning

The desire to build good systems in the face of complex societal effects...

Please sign up or login with your details

Forgot password? Click here to reset