A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

by   Shuai Ma, et al.

This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property – time consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula. With the pseudo mean, we propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Furthermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.


page 1

page 2

page 3

page 4


Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

Dynamic optimization of mean and variance in Markov decision processes (...

Mean-Variance Optimization in Markov Decision Processes

We consider finite horizon Markov decision processes under performance m...

Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

This paper investigates the optimization problem of an infinite stage di...

Effect of Reward Function Choices in MDPs with Value-at-Risk

This paper studies Value-at-Risk (VaR) problems in short- and long-horiz...

Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

CVaR (Conditional Value at Risk) is a risk metric widely used in finance...

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Risk management in dynamic decision problems is a primary concern in man...

Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

Variance plays a crucial role in risk-sensitive reinforcement learning, ...

Please sign up or login with your details

Forgot password? Click here to reset