ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

by   Ran Gong, et al.

Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete(e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's ability to follow human instructions based on the grounding of actions and states. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals. To promote language-instructed learning, we provide expert demonstrations with template-generated language descriptions. We assess task performance by utilizing the latest language-conditioned policy learning models. Our results indicate that current models for language-conditioned manipulations continue to experience significant challenges in novel goal-state generalizations, scene generalizations, and object generalizations. These findings highlight the need to develop new algorithms that address this gap and underscore the potential for further research in this area. See our project page at:


page 1

page 4

page 8

page 14

page 15

page 16

page 17

page 19


ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

We present ALFRED (Action Learning From Realistic Environments and Direc...

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

Complex manipulation tasks often require robots with complementary capab...

Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring

Despite recent progress, learning new tasks through language instruction...

Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following

We study the problem of learning a robot policy to follow natural langua...

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Existing benchmarks for grounding language in interactive environments e...

Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data

The growing interest in language-conditioned robot manipulation aims to ...

Understanding Grounded Language Learning Agents

Neural network-based systems can now learn to locate the referents of wo...

Please sign up or login with your details

Forgot password? Click here to reset