Approximate Policy Iteration Schemes: A Comparison

05/12/2014
by   Bruno Scherrer, et al.
0

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search by Dynamic Programming algorithm to the infinite-horizon case (PSDP_∞), and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API/API(α), but this comes at the cost of a relative---exponential in 1/ϵ---increase of the number of iterations. 2) PSDP_∞ enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP_∞ is proportional to their number of iterations, which may be problematic when the discount factor γ is close to 1 or the approximation error ϵ is close to 0; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis.

READ FULL TEXT

page 13

page 14

page 15

research
06/03/2013

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem form...
research
04/12/2010

Dynamic Policy Programming

In this paper, we propose a novel policy iteration method, called dynami...
research
10/04/2019

Approximate policy iteration using neural networks for storage problems

We consider the stochastic single node energy storage problem (SNES) and...
research
02/06/2022

Trusted Approximate Policy Iteration with Bisimulation Metrics

Bisimulation metrics define a distance measure between states of a Marko...
research
05/28/2018

Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms...
research
06/06/2013

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Local Policy Search is a popular reinforcement learning approach for han...
research
10/21/2022

online and lightweight kernel-based approximated policy iteration for dynamic p-norm linear adaptive filtering

This paper introduces a solution to the problem of selecting dynamically...

Please sign up or login with your details

Forgot password? Click here to reset