Approximate Policy Iteration Schemes: A Comparison

by   Bruno Scherrer, et al.

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search by Dynamic Programming algorithm to the infinite-horizon case (PSDP_∞), and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API/API(α), but this comes at the cost of a relative---exponential in 1/ϵ---increase of the number of iterations. 2) PSDP_∞ enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP_∞ is proportional to their number of iterations, which may be problematic when the discount factor γ is close to 1 or the approximation error ϵ is close to 0; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis.


page 13

page 14

page 15


On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem form...

Dynamic Policy Programming

In this paper, we propose a novel policy iteration method, called dynami...

Approximate policy iteration using neural networks for storage problems

We consider the stochastic single node energy storage problem (SNES) and...

Trusted Approximate Policy Iteration with Bisimulation Metrics

Bisimulation metrics define a distance measure between states of a Marko...

Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms...

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Local Policy Search is a popular reinforcement learning approach for han...

online and lightweight kernel-based approximated policy iteration for dynamic p-norm linear adaptive filtering

This paper introduces a solution to the problem of selecting dynamically...

Please sign up or login with your details

Forgot password? Click here to reset