ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

10/08/2020
by   Mohit Shridhar, et al.
10

Given a simple request (e.g., Put a washed apple in the kitchen fridge), humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstract, text-based policies in TextWorld (Côté et al., 2018) and then execute goals from the ALFRED benchmark (Shridhar et al., 2020) in a rich visual environment. ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions. In turn, as we demonstrate empirically, this fosters better agent generalization than training only in the visually grounded environment. BUTLER's simple, modular design factors the problem to allow researchers to focus on models for improving every piece of the pipeline (language understanding, planning, navigation, visual scene understanding, and so forth).

READ FULL TEXT
research
03/04/2019

The StreetLearn Environment and Dataset

Navigation is a rich and well-grounded problem domain that drives progre...
research
09/25/2020

Visually Grounded Compound PCFGs

Exploiting visual groundings for language understanding has recently bee...
research
09/29/2020

Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions

The recently proposed ALFRED challenge task aims for a virtual robotic a...
research
05/17/2022

GraphMapper: Efficient Visual Navigation by Scene Graph Generation

Understanding the geometric relationships between objects in a scene is ...
research
05/30/2017

Generative Models of Visually Grounded Imagination

It is easy for people to imagine what a man with pink hair looks like, e...
research
06/01/2020

Probing Emergent Semantics in Predictive Agents via Question Answering

Recent work has shown how predictive modeling can endow agents with rich...
research
10/13/2022

Behavior Cloned Transformers are Neurosymbolic Reasoners

In this work, we explore techniques for augmenting interactive agents wi...

Please sign up or login with your details

Forgot password? Click here to reset