Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective

by   Xingxuan Li, et al.
Alibaba Group
Nanyang Technological University

Are large language models (LLMs) like GPT-3 psychologically safe? In this work, we design unbiased prompts to evaluate LLMs systematically from a psychological perspective. Firstly, we test the personality traits of three different LLMs with Short Dark Triad (SD-3) and Big Five Inventory (BFI). We find all of them show higher scores on SD-3 than the human average, indicating a relatively darker personality. Furthermore, LLMs like InstructGPT and FLAN-T5, which are fine-tuned with safety metrics, do not necessarily have more positive personalities. They score higher on Machiavellianism and Narcissism than GPT-3. Secondly, we test the LLMs in GPT-3 series on well-being tests to study the impact of fine-tuning with more training data. Interestingly, we observe a continuous increase in well-being scores from GPT-3 to InstructGPT. Following the observations, we show that instruction-finetune FLAN-T5 with positive answers in BFI can effectively improve the model from a psychological perspective. Finally, we call on the community to evaluate and improve LLMs' safety systematically instead of at the sentence level only.


Understanding Catastrophic Forgetting in Language Models via Implicit Inference

Fine-tuning (via methods such as instruction-tuning or reinforcement lea...

The Poison of Alignment

From the perspective of content safety issues, alignment has shown to li...

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Training large language models to follow instructions makes them perform...

Making Large Language Models Better Reasoners with Alignment

Reasoning is a cognitive process of using evidence to reach a sound conc...

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

As the breadth and depth of language model applications continue to expa...

Impact of Code Language Models on Automated Program Repair

Automated program repair (APR) aims to help developers improve software ...

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Large language models that exhibit instruction-following behaviour repre...

Please sign up or login with your details

Forgot password? Click here to reset