A Hybrid PAC Reinforcement Learning Algorithm

09/05/2020
by   Ashkan Zehfroosh, et al.
0

This paper offers a new hybrid probably asymptotically correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of its parents. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results that support the claim regarding the new algorithm's sample efficiency compared to its parents are showcased in a small grid-world example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Recently there is a surge of interest in understanding the horizon-depen...
research
09/05/2020

PAC Reinforcement Learning Algorithm for General-Sum Markov Games

This paper presents a theoretical framework for probably approximately c...
research
11/21/2018

Model-Based Reinforcement Learning in Contextual Decision Processes

We study the sample complexity of model-based reinforcement learning in ...
research
05/23/2019

PAC Guarantees for Concurrent Reinforcement Learning with Restricted Communication

We develop model free PAC performance guarantees for multiple concurrent...
research
06/23/2023

Active Coverage for PAC Reinforcement Learning

Collecting and leveraging data with good coverage properties plays a cru...
research
06/19/2021

A Max-Min Entropy Framework for Reinforcement Learning

In this paper, we propose a max-min entropy framework for reinforcement ...
research
06/27/2012

Incremental Model-based Learners With Formal Learning-Time Guarantees

Model-based learning algorithms have been shown to use experience effici...

Please sign up or login with your details

Forgot password? Click here to reset