Learning Value Functions from Undirected State-only Experience

04/26/2022
by   Matthew Chang, et al.
11

This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicability of Q-learning in this setting. We show that tabular Q-learning in discrete Markov decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. Latent Action Q-learning (LAQ) learns value functions using Q-learning on discrete latent actions obtained through a latent-variable future prediction model. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions. Value functions learned using LAQ lead to sample efficient acquisition of goal-directed behavior, can be used with domain-specific low-level controllers, and facilitate transfer across embodiments. Our experiments in 5 environments ranging from 2D grid world to 3D visual navigation in realistic environments demonstrate the benefits of LAQ over simpler alternatives, imitation learning oracles, and competing methods.

READ FULL TEXT

page 3

page 6

page 23

page 24

page 25

research
11/03/2022

Contrastive Value Learning: Implicit Models for Simple Offline RL

Model-based reinforcement learning (RL) methods are appealing in the off...
research
10/27/2021

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data

The aim in imitation learning is to learn effective policies by utilizin...
research
12/04/2015

Q-Networks for Binary Vector Actions

In this paper reinforcement learning with binary vector actions was inve...
research
05/07/2018

Planning and Learning with Stochastic Action Sets

In many practical uses of reinforcement learning (RL) the set of actions...
research
03/09/2020

Learning discrete state abstractions with deep variational inference

Abstraction is crucial for effective sequential decision making in domai...
research
05/04/2021

Modelling age-related changes in executive functions of soccer players

The widespread popularity of soccer across the globe has turned it into ...

Please sign up or login with your details

Forgot password? Click here to reset