Some Upper Bounds on the Running Time of Policy Iteration on Deterministic MDPs

11/28/2022
by   Ritesh Goenka, et al.
0

Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms, and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2013

On the Complexity of Policy Iteration

Decision-making problems in uncertain or stochastic domains are often fo...
research
01/31/2023

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

We introduce the Blackwell discount factor for Markov Decision Processes...
research
02/18/2015

Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

Recent years have seen the development of methods for multiagent plannin...
research
06/03/2013

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and a totalnumber m ...
research
01/29/2021

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

We consider Markov Decision Processes (MDPs) in which every stationary p...
research
07/16/2021

Refined Policy Improvement Bounds for MDPs

The policy improvement bound on the difference of the discounted returns...
research
09/16/2020

Lower Bounds for Policy Iteration on Multi-action MDPs

Policy Iteration (PI) is a classical family of algorithms to compute an ...

Please sign up or login with your details

Forgot password? Click here to reset