VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning

10/26/2020
by   Thomas Carta, et al.
8

We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment). Real-life problems often demand that agents interact with the environment using both natural language information and visual perception towards solving a goal. However, most traditional RL environments either solve pure vision-based tasks like Atari games or video-based robotic manipulation; or entirely use natural language as a mode of interaction, like Text-based games and dialog systems. In this work, we aim to bridge this gap and unify these two approaches in a single environment for multimodal RL. We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment. The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal. We enable variations and difficulties in our environment to emulate various interactive real-world scenarios. We present a baseline multimodal agent for solving such problems using CNN-based feature extraction from visual hints and LSTMs for textual feature extraction. We believe that our proposed visual-lingual environment will facilitate novel problem settings for the RL community.

READ FULL TEXT

page 2

page 3

page 6

page 7

page 8

research
09/04/2019

LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games

While Reinforcement Learning (RL) approaches lead to significant achieve...
research
05/31/2022

IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

We present the IGLU Gridworld: a reinforcement learning environment for ...
research
07/08/2023

ScriptWorld: Text Based Environment For Learning Procedural Knowledge

Text-based games provide a framework for developing natural language und...
research
09/20/2021

A Survey of Text Games for Reinforcement Learning informed by Natural Language

Reinforcement Learning has shown success in a number of complex virtual ...
research
11/17/2015

Learning Articulated Motion Models from Visual and Lingual Signals

In order for robots to operate effectively in homes and workplaces, they...
research
11/29/2022

DiffG-RL: Leveraging Difference between State and Common Sense

Taking into account background knowledge as the context has always been ...
research
07/05/2021

Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems

Humans and other intelligent animals evolved highly sophisticated percep...

Please sign up or login with your details

Forgot password? Click here to reset