Semi-Markov Offline Reinforcement Learning for Healthcare

03/17/2022
by   Mehdi Fatemi, et al.
0

Reinforcement learning (RL) tasks are typically framed as Markov Decision Processes (MDPs), assuming that decisions are made at fixed time intervals. However, many applications of great importance, including healthcare, do not satisfy this assumption, yet they are commonly modelled as MDPs after an artificial reshaping of the data. In addition, most healthcare (and similar) problems are offline by nature, allowing for only retrospective studies. To address both challenges, we begin by discussing the Semi-MDP (SMDP) framework, which formally handles actions of variable timings. We next present a formal way to apply SMDP modifications to nearly any given value-based offline RL method. We use this theory to introduce three SMDP-based offline RL algorithms, namely, SDQN, SDDQN, and SBCQ. We then experimentally demonstrate that only these SMDP-based algorithms learn the optimal policy in variable-time environments, whereas their MDP counterparts do not. Finally, we apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention and demonstrate similar results.

READ FULL TEXT
research
04/15/2020

Reinforcement Learning in a Physics-Inspired Semi-Markov Environment

Reinforcement learning (RL) has been demonstrated to have great potentia...
research
10/18/2020

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

We study an approach to offline reinforcement learning (RL) based on opt...
research
09/18/2022

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

We study the offline reinforcement learning (RL) in the face of unmeasur...
research
02/29/2012

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

The use of Reinforcement Learning in real-world scenarios is strongly li...
research
01/31/2022

Reinforcement Learning with Heterogeneous Data: Estimation and Inference

Reinforcement Learning (RL) has the promise of providing data-driven sup...
research
05/05/2022

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Dynamic mechanism design has garnered significant attention from both co...
research
03/22/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn ...

Please sign up or login with your details

Forgot password? Click here to reset