Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors

by   Mohammad Reza Taesiri, et al.

Video game testing requires game-specific knowledge as well as common sense reasoning about the events in the game. While AI-driven agents can satisfy the first requirement, it is not yet possible to meet the second requirement automatically. Therefore, video game testing often still relies on manual testing, and human testers are required to play the game thoroughly to detect bugs. As a result, it is challenging to fully automate game testing. In this study, we explore the possibility of leveraging the zero-shot capabilities of large language models for video game bug detection. By formulating the bug detection problem as a question-answering task, we show that large language models can identify which event is buggy in a sequence of textual descriptions of events from a game. To this end, we introduce the GameBugDescriptions benchmark dataset, which consists of 167 buggy gameplay videos and a total of 334 question-answer pairs across 8 games. We extensively evaluate the performance of six models across the OPT and InstructGPT large language model families on our benchmark dataset. Our results show promising results for employing language models to detect video game bugs. With the proper prompting technique, we could achieve an accuracy of 70.66 to 78.94


page 1

page 3

page 4

page 6

page 7


Garbage in, garbage out: Zero-shot detection of crime using Large Language Models

This paper proposes exploiting the common sense knowledge learned by lar...

CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning

Gameplay videos contain rich information about how players interact with...

Learning to Identify Perceptual Bugs in 3D Video Games

Automated Bug Detection (ABD) in video games is composed of two distinct...

Testing the Ability of Language Models to Interpret Figurative Language

Figurative and metaphorical language are commonplace in discourse, and f...

Enhancing the Monte Carlo Tree Search Algorithm for Video Game Testing

In this paper, we study the effects of several Monte Carlo Tree Search (...

RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot

Performance bugs are non-functional bugs that can even manifest in well-...

What do Large Language Models Learn about Scripts?

Script Knowledge (Schank and Abelson, 1975) has long been recognized as ...

Please sign up or login with your details

Forgot password? Click here to reset