Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

by   Christopher Clark, et al.

Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is drawing by composing icons, and the Drawer iteratively revises the drawing to help the Guesser in response. This back-and-forth often uses canonical scenes, visual metaphor, or icon compositions to express challenging words, making it an ideal test for mixing language and visual/symbolic communication in AI. We propose models to play Iconary and train them on over 55,000 games between human players. Our models are skillful players and are able to employ world knowledge in language models to play with words unseen during training. Elite human players outperform our models, particularly at the drawing task, leaving an important gap for future research to address. We release our dataset, code, and evaluation setup as a challenge to the community at http://www.github.com/allenai/iconary.


page 2

page 3

page 18

page 19

page 20

page 21

page 22


WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

While vision-and-language models perform well on tasks such as visual qu...

CoDraw: Visual Dialog for Collaborative Drawing

In this work, we propose a goal-driven collaborative task that contains ...

From Language Games to Drawing Games

We attempt to automate various artistic processes by inventing a set of ...

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

Recent advancements in natural language and Large Language Models (LLMs)...

Rethinking Model Evaluation as Narrowing the Socio-Technical Gap

The recent development of generative and large language models (LLMs) po...

CALYPSO: LLMs as Dungeon Masters' Assistants

The role of a Dungeon Master, or DM, in the game Dungeons Dragons is...

Pragmatic inference and visual abstraction enable contextual flexibility during visual communication

Visual modes of communication are ubiquitous in modern life — from maps ...

Please sign up or login with your details

Forgot password? Click here to reset