A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

07/09/2019
by   Shuai Ma, et al.
0

We present a scheme for sequential decision making with a risk-sensitive objective and constraints in a dynamic environment. A neural network is trained as an approximator of the mapping from parameter space to space of risk and policy with risk-sensitive constraints. For a given risk-sensitive problem, in which the objective and constraints are, or can be estimated by, functions of the mean and variance of return, we generate a synthetic dataset as training data. Parameters defining a targeted process might be dynamic, i.e., they might vary over time, so we sample them within specified intervals to deal with these dynamics. We show that: i). Most risk measures can be estimated using return variance; ii). By virtue of the state-augmentation transformation, practical problems modeled by Markov decision processes with stochastic rewards can be solved in a risk-sensitive scenario; and iii). The proposed scheme is validated by a numerical experiment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

Variance plays a crucial role in risk-sensitive reinforcement learning, ...
research
02/28/2018

Verification of Markov Decision Processes with Risk-Sensitive Measures

We develop a method for computing policies in Markov decision processes ...
research
06/23/2020

Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

We introduce a novel framework to account for sensitivity to rewards unc...
research
03/25/2014

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

In many sequential decision-making problems we may want to manage risk b...
research
02/22/2022

Approximate gradient ascent methods for distortion risk measures

We propose approximate gradient ascent algorithms for risk-sensitive rei...
research
11/04/2021

Model-Free Risk-Sensitive Reinforcement Learning

We extend temporal-difference (TD) learning in order to obtain risk-sens...
research
08/19/2022

A Risk-Sensitive Approach to Policy Optimization

Standard deep reinforcement learning (DRL) aims to maximize expected rew...

Please sign up or login with your details

Forgot password? Click here to reset