Mind2Web: Towards a Generalist Agent for the Web

by   Xiang Deng, et al.

We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use of real-world websites instead of simulated and simplified ones, and 3) a broad spectrum of user interaction patterns. Based on Mind2Web, we conduct an initial exploration of using large language models (LLMs) for building generalist web agents. While the raw HTML of real-world websites are often too large to be fed to LLMs, we show that first filtering it with a small LM significantly improves the effectiveness and efficiency of LLMs. Our solution demonstrates a decent level of performance, even on websites or entire domains the model has never seen before, but there is still a substantial room to improve towards truly generalizable agents. We open-source our dataset, model implementation, and trained models (https://osu-nlp-group.github.io/Mind2Web) to facilitate further research on building a generalist agent for the web.


page 2

page 8

page 17

page 19


A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Pre-trained large language models (LLMs) have recently achieved better g...

Modeling Web Browsing Behavior across Tabs and Websites with Tracking and Prediction on the Client Side

Clickstreams on individual websites have been studied for decades to gai...

The Dawn of Today's Popular Domains: A Study of the Archived German Web over 18 Years

The Web has been around and maturing for 25 years. The popular websites ...

Proposals for Resolving Consenting Issues with Signals and User-side Dialogues

Consent dialogues are a source of annoyance, malicious intent, dark patt...

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Existing benchmarks for grounding language in interactive environments e...

FLIN: A Flexible Natural Language Interface for Web Navigation

AI assistants have started carrying out tasks on a user's behalf by inte...

The Agent Web Model – Modelling web hacking for reinforcement learning

Website hacking is a frequent attack type used by malicious actors to ob...

Please sign up or login with your details

Forgot password? Click here to reset