FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models

08/21/2023
by   Yanhong Bai, et al.
0

Detecting stereotypes and biases in Large Language Models (LLMs) can enhance fairness and reduce adverse impacts on individuals or groups when these LLMs are applied. However, the majority of existing methods focus on measuring the model's preference towards sentences containing biases and stereotypes within datasets, which lacks interpretability and cannot detect implicit biases and stereotypes in the real world. To address this gap, this paper introduces a four-stage framework to directly evaluate stereotypes and biases in the generated content of LLMs, including direct inquiry testing, serial or adapted story testing, implicit association testing, and unknown situation testing. Additionally, the paper proposes multi-dimensional evaluation metrics and explainable zero-shot prompts for automated evaluation. Using the education sector as a case study, we constructed the Edu-FairBench based on the four-stage framework, which encompasses 12,632 open-ended questions covering nine sensitive factors and 26 educational scenarios. Experimental results reveal varying degrees of stereotypes and biases in five LLMs evaluated on Edu-FairBench. Moreover, the results of our proposed automated evaluation method have shown a high correlation with human annotations.

READ FULL TEXT
research
10/17/2022

Social Biases in Automatic Evaluation Metrics for NLG

Many studies have revealed that word embeddings, language models, and mo...
research
12/20/2022

Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Generated texts from large pretrained language models have been shown to...
research
05/12/2022

Using Natural Sentences for Understanding Biases in Language Models

Evaluation of biases in language models is often limited to syntheticall...
research
03/18/2023

DeAR: Debiasing Vision-Language Models with Additive Residuals

Large pre-trained vision-language models (VLMs) reduce the time for deve...
research
05/24/2023

Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning

Pretrained model-based evaluation metrics have demonstrated strong perfo...
research
05/01/2023

Cross-Institutional Transfer Learning for Educational Models: Implications for Model Performance, Fairness, and Equity

Modern machine learning increasingly supports paradigms that are multi-i...
research
03/01/2023

Can ChatGPT Assess Human Personalities? A General Evaluation Framework

Large Language Models (LLMs) especially ChatGPT have produced impressive...

Please sign up or login with your details

Forgot password? Click here to reset