Approximate Policy Iteration Schemes: A Comparison

05/12/2014

∙

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search by Dynamic Programming algorithm to the infinite-horizon case (PSDP_∞), and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API/API(α), but this comes at the cost of a relative---exponential in 1/ϵ---increase of the number of iterations. 2) PSDP_∞ enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP_∞ is proportional to their number of iterations, which may be problematic when the discount factor γ is close to 1 or the approximation error ϵ is close to 0; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis.

READ FULL TEXT

Approximate Policy Iteration Schemes: A Comparison

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

Dynamic Policy Programming

Approximate policy iteration using neural networks for storage problems

Trusted Approximate Policy Iteration with Bisimulation Metrics

Dual Policy Iteration

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

online and lightweight kernel-based approximated policy iteration for dynamic p-norm linear adaptive filtering

Approximate Policy Iteration Schemes: A Comparison

Related Research

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

Dynamic Policy Programming

Approximate policy iteration using neural networks for storage problems

Trusted Approximate Policy Iteration with Bisimulation Metrics

Dual Policy Iteration

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

online and lightweight kernel-based approximated policy iteration for dynamic p-norm linear adaptive filtering