Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective

12/20/2022
by   Xingxuan Li, et al.
Alibaba Group
Nanyang Technological University
32

Are large language models (LLMs) like GPT-3 psychologically safe? In this work, we design unbiased prompts to evaluate LLMs systematically from a psychological perspective. Firstly, we test the personality traits of three different LLMs with Short Dark Triad (SD-3) and Big Five Inventory (BFI). We find all of them show higher scores on SD-3 than the human average, indicating a relatively darker personality. Furthermore, LLMs like InstructGPT and FLAN-T5, which are fine-tuned with safety metrics, do not necessarily have more positive personalities. They score higher on Machiavellianism and Narcissism than GPT-3. Secondly, we test the LLMs in GPT-3 series on well-being tests to study the impact of fine-tuning with more training data. Interestingly, we observe a continuous increase in well-being scores from GPT-3 to InstructGPT. Following the observations, we show that instruction-finetune FLAN-T5 with positive answers in BFI can effectively improve the model from a psychological perspective. Finally, we call on the community to evaluate and improve LLMs' safety systematically instead of at the sentence level only.

READ FULL TEXT
09/18/2023

Understanding Catastrophic Forgetting in Language Models via Implicit Inference

Fine-tuning (via methods such as instruction-tuning or reinforcement lea...
08/25/2023

The Poison of Alignment

From the perspective of content safety issues, alignment has shown to li...
09/14/2023

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Training large language models to follow instructions makes them perform...
09/05/2023

Making Large Language Models Better Reasoners with Alignment

Reasoning is a cognitive process of using evidence to reach a sound conc...
07/19/2023

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

As the breadth and depth of language model applications continue to expa...
02/10/2023

Impact of Code Language Models on Automated Program Repair

Automated program repair (APR) aims to help developers improve software ...
07/08/2023

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Large language models that exhibit instruction-following behaviour repre...

Please sign up or login with your details

Forgot password? Click here to reset