Learning Factored Markov Decision Processes with Unawareness

02/27/2019
by   Craig Innes, et al.
0

Methods for learning and planning in sequential decision problems often assume the learner is aware of all possible states and actions in advance. This assumption is sometimes untenable. In this paper, we give a method to learn factored markov decision problems from both domain exploration and expert assistance, which guarantees convergence to near-optimal behaviour, even when the agent begins unaware of factors critical to success. Our experiments show our agent learns optimal behaviour on small and large problems, and that conserving information on discovering new possibilities results in faster convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2014

MDPs with Unawareness

Markov decision processes (MDPs) are widely used for modeling decision-m...
research
01/10/2018

Reasoning about Unforeseen Possibilities During Policy Learning

Methods for learning optimal policies in autonomous agents often assume ...
research
11/09/2018

Performance Guarantees for Homomorphisms Beyond Markov Decision Processes

Most real-world problems have huge state and/or action spaces. Therefore...
research
12/31/2020

Robust Asymmetric Learning in POMDPs

Policies for partially observed Markov decision processes can be efficie...
research
06/26/2020

What can I do here? A Theory of Affordances in Reinforcement Learning

Reinforcement learning algorithms usually assume that all actions are al...
research
01/23/2019

Robust temporal difference learning for critical domains

We present a new Q-function operator for temporal difference (TD) learni...
research
02/25/2020

Near Optimal Task Graph Scheduling with Priced Timed Automata and Priced Timed Markov Decision Processes

Task graph scheduling is a relevant problem in computer science with app...

Please sign up or login with your details

Forgot password? Click here to reset