Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems

07/05/2021
by   Shashank Hegde, et al.
0

Humans and other intelligent animals evolved highly sophisticated perception systems that combine multiple sensory modalities. On the other hand, state-of-the-art artificial agents rely mostly on visual inputs or structured low-dimensional observations provided by instrumented environments. Learning to act based on combined visual and auditory inputs is still a new topic of research that has not been explored beyond simple scenarios. To facilitate progress in this area we introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We study the performance of different model architectures in a series of tasks that require the agent to recognize sounds and execute instructions given in natural language. Finally, we train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary. We are currently in the process of merging the augmented simulator with the main ViZDoom code repository. Video demonstrations and experiment code can be found at https://sites.google.com/view/sound-rl.

READ FULL TEXT

page 1

page 3

research
11/28/2019

Playing Games in the Dark: An approach for cross-modality transfer in reinforcement learning

In this work we explore the use of latent representations obtained from ...
research
12/25/2019

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial aspect of mobile intelligent agents is their ability to integr...
research
04/18/2017

Beating Atari with Natural Language Guided Reinforcement Learning

We introduce the first deep reinforcement learning agent that learns to ...
research
07/25/2019

Google Research Football: A Novel Reinforcement Learning Environment

Recent progress in the field of reinforcement learning has been accelera...
research
05/21/2020

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Traditional robotic approaches rely on an accurate model of the environm...
research
10/26/2020

VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning

We present VisualHints, a novel environment for multimodal reinforcement...
research
02/08/2021

Unlocking Pixels for Reinforcement Learning via Implicit Attention

There has recently been significant interest in training reinforcement l...

Please sign up or login with your details

Forgot password? Click here to reset