Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text

by   Liam Dugan, et al.

As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text.


page 5

page 6


RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text

In recent years, large neural networks for natural language generation (...

Human and Automatic Detection of Generated Text

With the advent of generative models with a billion parameters or more, ...

Scarecrow: A Framework for Scrutinizing Machine Text

Modern neural text generation systems can produce remarkably fluent and ...

On the Reliability of Watermarks for Large Language Models

As LLMs become commonplace, machine-generated text has the potential to ...

ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

ChatGPT has the ability to generate grammatically flawless and seemingly...

Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

Large language models (LLMs) have gained popularity in various fields fo...

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Large generative language models such as GPT-2 are well-known for their ...

Please sign up or login with your details

Forgot password? Click here to reset