Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains

by   Divyanshu Raj, et al.

In this work, we present an approach to identify sub-tasks within a demonstrated robot trajectory using language instructions. We identify these sub-tasks using language provided during demonstrations as guidance to identify sub-segments of a longer robot trajectory. Given a sequence of natural language instructions and a long trajectory consisting of image frames and discrete actions, we want to map an instruction to a smaller fragment of the trajectory. Unlike previous instruction following works which directly learn the mapping from language to a policy, we propose a language-conditioned change-point detection method to identify sub-tasks in a problem. Our approach learns the relationship between constituent segments of a long language command and corresponding constituent segments of a trajectory. These constituent trajectory segments can be used to learn subtasks or sub-goals for planning or options as demonstrated by previous related work. Our insight in this work is that the language-conditioned robot change-point detection problem is similar to the existing video moment retrieval works used to identify sub-segments within online videos. Through extensive experimentation, we demonstrate a 1.78_± 0.82% improvement over a baseline approach in accurately identifying sub-tasks within a trajectory using our proposed method. Moreover, we present a comprehensive study investigating sample complexity requirements on learning this mapping, between language and trajectory sub-segments, to understand if the video retrieval-based methods are realistic in real robot scenarios.


page 1

page 6

page 7

page 8

page 9


Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

The vision-language navigation (VLN) task requires an agent to reach a t...

Translating Natural Language Instructions to Computer Programs for Robot Manipulation

It is highly desirable for robots that work alongside humans to be able ...

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

Our goal is for robots to follow natural language instructions like "put...

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

Complex manipulation tasks often require robots with complementary capab...

Where to Play: Retrieval of Video Segments using Natural-Language Queries

In this paper, we propose a new approach for retrieval of video segments...

Chasing Ghosts: Instruction Following as Bayesian State Tracking

A visually-grounded navigation instruction can be interpreted as a seque...

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

Multimodal alignment facilitates the retrieval of instances from one mod...

Please sign up or login with your details

Forgot password? Click here to reset