Impossibility of deducing preferences and rationality from human policy

by   Stuart Armstrong, et al.

Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior. However, human planning systematically deviates from rationality. Though there has been some IRL work which assumes humans are noisily rational, there has been little analysis of the general problem of inferring the reward of a human of unknown rationality. The observed behavior can, in principle, be decomposed into two composed into two components: a reward function and a planning algorithm that maps reward function to policy. Both of these variables have to be inferred from behaviour. This paper presents a "No Free Lunch" theorem in this area, showing that, without making `normative' assumptions beyond the data, nothing about the human reward function can be deduced from human behaviour. Unlike most No Free Lunch theorems, this cannot be alleviated by regularising with simplicity assumptions. The simplest hypotheses are generally degenerate. The paper will then sketch how one might begin to use normative assumptions to get around the problem, without which solving the general IRL problem is impossible. The reward function-planning algorithm formalism can also be used to encode what it means for an agent to manipulate or override human preferences.


page 1

page 2

page 3

page 4


Misspecification in Inverse Reinforcement Learning

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward fun...

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...

Content Masked Loss: Human-Like Brush Stroke Planning in a Reinforcement Learning Painting Agent

The objective of most Reinforcement Learning painting agents is to minim...

Accounting for Human Learning when Inferring Human Preferences

Inverse reinforcement learning (IRL) is a common technique for inferring...

LESS is More: Rethinking Probabilistic Models of Human Behavior

Robots need models of human behavior for both inferring human goals and ...

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior

Inferring intent from observed behavior has been studied extensively wit...

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

There is a recent trend of applying multi-agent reinforcement learning (...

Please sign up or login with your details

Forgot password? Click here to reset