Stochastically Dominant Distributional Reinforcement Learning

05/17/2019
by   John D. Martin, et al.
0

We describe a new approach for mitigating risk in the Reinforcement Learning paradigm. Instead of reasoning about expected utility, we use second-order stochastic dominance (SSD) to directly compare the inherent risk of random returns induced by different actions. We frame the RL optimization within the space of probability measures to accommodate the SSD relation, treating Bellman's equation as a potential energy functional. This brings us to Wasserstein gradient flows, for which the optimality and convergence are well understood. We propose a discrete-measure approximation algorithm called the Dominant Particle Agent (DPA), and we demonstrate how safety and performance are better balanced with DPA than with existing baselines.

READ FULL TEXT
research
04/27/2023

One-Step Distributional Reinforcement Learning

Reinforcement learning (RL) allows an agent interacting sequentially wit...
research
08/09/2018

Policy Optimization as Wasserstein Gradient Flows

Policy optimization is a core component of reinforcement learning (RL), ...
research
07/02/2023

Is Risk-Sensitive Reinforcement Learning Properly Resolved?

Due to the nature of risk management in learning applicable policies, ri...
research
06/28/2022

Risk Perspective Exploration in Distributional Reinforcement Learning

Distributional reinforcement learning demonstrates state-of-the-art perf...
research
02/01/2021

Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

In many risk-aware and multi-objective reinforcement learning settings, ...
research
10/12/2020

Efficient Wasserstein Natural Gradients for Reinforcement Learning

A novel optimization approach is proposed for application to policy grad...

Please sign up or login with your details

Forgot password? Click here to reset