A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments

by   Sherif Abdelfattah, et al.

Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity to evolve a coverage set of policies that can solve the problem. This paper introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.


page 1

page 4

page 8

page 11

page 12

page 13

page 14

page 15


Intrinsically Motivated Hierarchical Policy Learning in Multi-objective Markov Decision Processes

Multi-objective Markov decision processes are sequential decision-making...

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

We consider an agent who is involved in a Markov decision process and re...

Answer Set Programming for Non-Stationary Markov Decision Processes

Non-stationary domains, where unforeseen changes happen, present a chall...

Dynamic Resource Configuration for Low-Power IoT Networks: A Multi-Objective Reinforcement Learning Method

Considering grant-free transmissions in low-power IoT networks with unkn...

Deep W-Networks: Solving Multi-Objective Optimisation Problems With Deep Reinforcement Learning

In this paper, we build on advances introduced by the Deep Q-Networks (D...

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

This paper addresses policy learning in non-stationary environments and ...

Oiling the Wheels of Change: The Role of Adaptive Automatic Problem Decomposition in Non--Stationary Environments

Genetic algorithms (GAs) that solve hard problems quickly, reliably and ...

Please sign up or login with your details

Forgot password? Click here to reset