Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning

by   Selma Wanna, et al.
The University of Texas at Austin

Recent advances in generative modeling have spurred a resurgence in the field of Embodied Artificial Intelligence (EAI). EAI systems typically deploy large language models to physical systems capable of interacting with their environment. In our exploration of EAI for industrial domains, we successfully demonstrate the feasibility of co-located, human-robot teaming. Specifically, we construct an experiment where an Augmented Reality (AR) headset mediates information exchange between an EAI agent and human operator for a variety of inspection tasks. To our knowledge the use of an AR headset for multimodal grounding and the application of EAI to industrial tasks are novel contributions within Embodied AI research. In addition, we highlight potential pitfalls in EAI's construction by providing quantitative and qualitative analysis on prompt robustness.


page 6

page 11

page 12


Evaluation of AI-Supported Input Methods in Augmented Reality Environment

Augmented Reality (AR) solutions are providing tools that could improve ...

Concepts for End-to-end Augmented Reality based Human-Robot Interaction Systems

The field of Augmented Reality (AR) based Human Robot Interaction (HRI) ...

Augmented Reality for Maintenance Tasks with ChatGPT for Automated Text-to-Action

Advancements in sensor technology, artificial intelligence (AI), and aug...

ARDIE: AR, Dialogue, and Eye Gaze Policies for Human-Robot Collaboration

Human-robot collaboration (HRC) has become increasingly relevant in indu...

Learning Visualization Policies of Augmented Reality for Human-Robot Collaboration

In human-robot collaboration domains, augmented reality (AR) technologie...

Learning to Act with Affordance-Aware Multimodal Neural SLAM

Recent years have witnessed an emerging paradigm shift toward embodied a...

Retrieving Multimodal Information for Augmented Generation: A Survey

In this survey, we review methods that retrieve multimodal knowledge to ...

Please sign up or login with your details

Forgot password? Click here to reset