Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity

by   Gabriel Simmons, et al.

Large Language Models (LLMs) have recently demonstrated impressive capability in generating fluent text. LLMs have also shown an alarming tendency to reproduce social biases, for example stereotypical associations between gender and occupation or race and criminal behavior. Like race and gender, morality is an important social variable; our moral biases affect how we receive other people and their arguments. I anticipate that the apparent moral capabilities of LLMs will play an important role in their effects on the human social environment. This work investigates whether LLMs reproduce the moral biases associated with political groups, a capability I refer to as moral mimicry. I explore this hypothesis in GPT-3, a 175B-parameter language model based on the Transformer architecture, using tools from Moral Foundations Theory to measure the moral content in text generated by the model following prompting with liberal and conservative political identities. The results demonstrate that large language models are indeed moral mimics; when prompted with a political identity, GPT-3 generates text reflecting the corresponding moral biases. Moral mimicry could contribute to fostering understanding between social groups via moral reframing. Worryingly, it could also reinforce polarized views, exacerbating existing social challenges. I hope that this work encourages further investigation of the moral mimicry capability, including how to leverage it for social good and minimize its risks.


page 9

page 19

page 20


Evaluating Biased Attitude Associations of Language Models in an Intersectional Context

Language models are trained on large-scale corpora that embed implicit b...

FairPy: A Toolkit for Evaluation of Social Biases and their Mitigation in Large Language Models

Studies have shown that large pretrained language models exhibit biases ...

AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in Controversial Topics

The introduction of ChatGPT and the subsequent improvement of Large Lang...

Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models

Large Language Models (LLMs) are trained primarily on minimally processe...

Large Language Models Can Be Used to Scale the Ideologies of Politicians in a Zero-Shot Learning Setting

The aggregation of knowledge embedded in large language models (LLMs) ho...

SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

Stereotype benchmark datasets are crucial to detect and mitigate social ...

How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases

The capabilities of natural language models trained on large-scale data ...

Please sign up or login with your details

Forgot password? Click here to reset