From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

05/31/2023
by   Peter Shaw, et al.
0

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available. These input representations have been often coupled with custom, task-specific action spaces. This paper focuses on creating agents that interact with the digital world using the same conceptual interface that humans commonly use – via pixel-based screenshots and a generic action space corresponding to keyboard and mouse actions. Building upon recent progress in pixel-based pretraining, we show, for the first time, that it is possible for such agents to outperform human crowdworkers on the MiniWob++ benchmark of GUI-based instruction following tasks.

READ FULL TEXT
research
03/15/2023

Lana: A Language-Capable Navigator for Instruction Following and Generation

Recently, visual-language navigation (VLN) – entailing robot agents to f...
research
02/06/2023

Challenges and Opportunities of Content Optimization for Freeform User Interfaces

While recent innovations on shape technologies allow for the creation of...
research
06/01/2023

STEVE-1: A Generative Model for Text-to-Behavior in Minecraft

Constructing AI models that respond to text instructions is challenging,...
research
11/08/2022

Learning to Follow Instructions in Text-Based Games

Text-based games present a unique class of sequential decision making pr...
research
08/26/2015

Alignment-based compositional semantics for instruction following

This paper describes an alignment-based model for interpreting natural l...
research
07/19/2023

Android in the Wild: A Large-Scale Dataset for Android Device Control

There is a growing interest in device-control systems that can interpret...
research
09/20/2023

You Only Look at Screens: Multimodal Chain-of-Action Agents

Autonomous user interface (UI) agents aim to facilitate task automation ...

Please sign up or login with your details

Forgot password? Click here to reset