Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

by   Alexis Jacq, et al.

Traditionally, Reinforcement Learning (RL) aims at deciding how to act optimally for an artificial agent. We argue that deciding when to act is equally important. As humans, we drift from default, instinctive or memorized behaviors to focused, thought-out behaviors when required by the situation. To enhance RL agents with this aptitude, we propose to augment the standard Markov Decision Process and make a new mode of action available: being lazy, which defers decision-making to a default policy. In addition, we penalize non-lazy actions in order to encourage minimal effort and have agents focus on critical decisions only. We name the resulting formalism lazy-MDPs. We study the theoretical properties of lazy-MDPs, expressing value functions and characterizing optimal solutions. Then we empirically demonstrate that policies learned in lazy-MDPs generally come with a form of interpretability: by construction, they show us the states where the agent takes control over the default policy. We deem those states and corresponding actions important since they explain the difference in performance between the default and the new, lazy policy. With suboptimal policies as default (pretrained or random), we observe that agents are able to get competitive performance in Atari games while only taking control in a limited subset of states.


page 9

page 10

page 11

page 12

page 25

page 26


Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

We study Markov decision processes (MDPs), where agents have direct cont...

Be Considerate: Objectives, Side Effects, and Deciding How to Act

Recent work in AI safety has highlighted that in sequential decision mak...

Control with adaptive Q-learning

This paper evaluates adaptive Q-learning (AQL) and single-partition adap...

Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations

In real-world reinforcement learning (RL) systems, various forms of impa...

Ranking Policy Decisions

Policies trained via Reinforcement Learning (RL) are often needlessly co...

Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Many real-world settings involve costs for performing actions; transacti...

MDPFuzzer: Finding Crash-Triggering State Sequences in Models Solving the Markov Decision Process

The Markov decision process (MDP) provides a mathematical framework for ...

Please sign up or login with your details

Forgot password? Click here to reset