Chat2VIS: Generating Data Visualisations via Natural Language using ChatGPT, Codex and GPT-3 Large Language Models

by   Paula Maddigan, et al.

The field of data visualisation has long aimed to devise solutions for generating visualisations directly from natural language text. Research in Natural Language Interfaces (NLIs) has contributed towards the development of such techniques. However, the implementation of workable NLIs has always been challenging due to the inherent ambiguity of natural language, as well as in consequence of unclear and poorly written user queries which pose problems for existing language models in discerning user intent. Instead of pursuing the usual path of developing new iterations of language models, this study uniquely proposes leveraging the advancements in pre-trained large language models (LLMs) such as ChatGPT and GPT-3 to convert free-form natural language directly into code for appropriate visualisations. This paper presents a novel system, Chat2VIS, which takes advantage of the capabilities of LLMs and demonstrates how, with effective prompt engineering, the complex problem of language understanding can be solved more efficiently, resulting in simpler and more accurate end-to-end solutions than prior approaches. Chat2VIS shows that LLMs together with the proposed prompts offer a reliable approach to rendering visualisations from natural language queries, even when queries are highly misspecified and underspecified. This solution also presents a significant reduction in costs for the development of NLI systems, while attaining greater visualisation inference abilities compared to traditional NLP approaches that use hand-crafted grammar rules and tailored models. This study also presents how LLM prompts can be constructed in a way that preserves data security and privacy while being generalisable to different datasets. This work compares the performance of GPT-3, Codex and ChatGPT across a number of case studies and contrasts the performances with prior studies.


page 1

page 2

page 3

page 4


Chat2VIS: Fine-Tuning Data Visualisations using Multilingual Natural Language Text and Pre-Trained Large Language Models

The explosion of data in recent years is driving individuals to leverage...

Assessing Language Models with Scaling Properties

Language models have primarily been evaluated with perplexity. While per...

On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex

Semantic parsing is a technique aimed at constructing a structured repre...

On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

The volume, variety, and velocity of change in vulnerabilities and explo...

Autonomous GIS: the next-generation AI-powered GIS

Large Language Models (LLMs), such as ChatGPT, demonstrate a strong unde...

"What It Wants Me To Say": Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models

Code-generating large language models translate natural language into co...

Text2Cohort: Democratizing the NCI Imaging Data Commons with Natural Language Cohort Discovery

The Imaging Data Commons (IDC) is a cloud-based database that provides r...

Please sign up or login with your details

Forgot password? Click here to reset