The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

06/17/2023
by   Nishil Patel, et al.
0

Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behaviour, including delayed learning under sparse rewards; a variety of learning regimes depending on reward baselines; and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade Learning Environment game "Pong" also show such a speed-accuracy trade-off in practice. Together, these results take a step towards closing the gap between theory and practice in high-dimensional RL.

READ FULL TEXT

page 10

page 24

research
03/23/2019

TTR-Based Rewards for Reinforcement Learning with Implicit Model Priors

Model-free reinforcement learning (RL) provides an attractive approach f...
research
10/04/2022

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Reinforcement learning (RL) has recently proven great success in various...
research
07/19/2023

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

The main challenge in developing effective reinforcement learning (RL) p...
research
11/03/2022

Contrastive Value Learning: Implicit Models for Simple Offline RL

Model-based reinforcement learning (RL) methods are appealing in the off...
research
10/08/2015

Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

Data-efficient reinforcement learning (RL) in continuous state-action sp...
research
08/23/2018

Playing 20 Question Game with Policy-Based Reinforcement Learning

The 20 Questions (Q20) game is a well known game which encourages deduct...
research
06/11/2019

Causal Discovery with Reinforcement Learning

Discovering causal structure among a set of variables is a fundamental p...

Please sign up or login with your details

Forgot password? Click here to reset