Evaluating AIGC Detectors on Code Content

by   Jian Wang, et al.

Artificial Intelligence Generated Content (AIGC) has garnered considerable attention for its impressive performance, with ChatGPT emerging as a leading AIGC model that produces high-quality responses across various applications, including software development and maintenance. Despite its potential, the misuse of ChatGPT poses significant concerns, especially in education and safetycritical domains. Numerous AIGC detectors have been developed and evaluated on natural language data. However, their performance on code-related content generated by ChatGPT remains unexplored. To fill this gap, in this paper, we present the first empirical study on evaluating existing AIGC detectors in the software domain. We created a comprehensive dataset including 492.5K samples comprising code-related content produced by ChatGPT, encompassing popular software activities like Q A (115K), code summarization (126K), and code generation (226.5K). We evaluated six AIGC detectors, including three commercial and three open-source solutions, assessing their performance on this dataset. Additionally, we conducted a human study to understand human detection capabilities and compare them with the existing AIGC detectors. Our results indicate that AIGC detectors demonstrate lower performance on code-related data compared to natural language data. Fine-tuning can enhance detector performance, especially for content within the same domain; but generalization remains a challenge. The human evaluation reveals that detection by humans is quite challenging.


page 1

page 2

page 3

page 4


SoTaNa: The Open-Source Software Development Assistant

Software development plays a crucial role in driving innovation and effi...

Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT

With ChatGPT under the spotlight, utilizing large language models (LLMs)...

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

Recent advances in natural language processing (NLP) have led to the dev...

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

We present The Vault, an open-source, large-scale code-text dataset desi...

Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study

Code review is an essential activity for ensuring the quality and mainta...

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

AlphaCode is a code generation system for assisting software developers ...

Evade ChatGPT Detectors via A Single Space

ChatGPT brings revolutionary social value but also raises concerns about...

Please sign up or login with your details

Forgot password? Click here to reset