Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

05/26/2020
by   Gen Li, et al.
4

We investigate the sample efficiency of reinforcement learning in a γ-discounted infinite-horizon Markov decision process (MDP) with state space S and action space A, assuming access to a generative model. Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, prior results suffer from a sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least |S||A|/(1-γ)^2 (up to some log factor). The current paper overcomes this barrier by certifying the minimax optimality of model-based reinforcement learning as soon as the sample size exceeds the order of |S||A|/1-γ (modulo some log factor). More specifically, a perturbed model-based planning algorithm provably finds an ε-optimal policy with an order of |S||A| /(1-γ)^3ε^2log|S||A|/(1-γ)ε samples for any ε∈ (0, 1/1-γ]. Along the way, we derive improved (instance-dependent) guarantees for model-based policy evaluation. To the best of our knowledge, this work provides the first minimax-optimal guarantee in a generative model that accommodates the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically impossible).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

The curse of dimensionality is a widely known issue in reinforcement lea...
research
06/10/2019

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes

This work considers the sample complexity of obtaining an ϵ-optimal poli...
research
02/12/2021

Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

Q-learning, which seeks to learn the optimal Q-function of a Markov deci...
research
10/09/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Achieving sample efficiency in online episodic reinforcement learning (R...
research
09/13/2022

Semiparametric Estimation of Optimal Dividend Barrier for Spectrally Negative Lévy Process

We disucss a statistical estimation problem of an optimal dividend barri...
research
02/10/2023

Towards Minimax Optimality of Model-based Robust Reinforcement Learning

We study the sample complexity of obtaining an ϵ-optimal policy in Robus...
research
07/25/2023

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL)...

Please sign up or login with your details

Forgot password? Click here to reset