Operator Splitting Value Iteration

11/25/2022
by   Amin Rakhsha, et al.
0

We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence rate when the model is accurate enough. We also introduce a sample-based version of the algorithm called OS-Dyna. Unlike the traditional Dyna architecture, OS-Dyna still converges to the correct value function in presence of model approximation error.

READ FULL TEXT
research
06/22/2021

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

We study the off-policy evaluation (OPE) problem in reinforcement learni...
research
06/15/2017

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we d...
research
07/11/2012

Heuristic Search Value Iteration for POMDPs

We present a novel POMDP planning algorithm called heuristic search valu...
research
06/27/2012

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

SDYNA is a general framework designed to address large stochastic reinfo...
research
06/25/2019

Expected Sarsa(λ) with Control Variate for Variance Reduction

Off-policy learning is powerful for reinforcement learning. However, the...
research
10/25/2021

Operator Augmentation for Model-based Policy Evaluation

In model-based reinforcement learning, the transition matrix and reward ...
research
03/14/2019

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Value function estimation is an important task in reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset