"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

by   Pingchuan Ma, et al.

As the popularity of large language models (LLMs) soars across various applications, ensuring their alignment with human values has become a paramount concern. In particular, given that LLMs have great potential to serve as general-purpose AI assistants in daily life, their subtly unethical suggestions become a serious and real concern. Tackling the challenge of automatically testing and repairing unethical suggestions is thus demanding. This paper introduces the first framework for testing and repairing unethical suggestions made by LLMs. We first propose ETHICSSUITE, a test suite that presents complex, contextualized, and realistic moral scenarios to test LLMs. We then propose a novel suggest-critic-reflect (SCR) process, serving as an automated test oracle to detect unethical suggestions. We recast deciding if LLMs yield unethical suggestions (a hard problem; often requiring human expertise and costly to decide) into a PCR task that can be automatically checked for violation. Moreover, we propose a novel on-the-fly (OTF) repairing scheme that repairs unethical suggestions made by LLMs in real-time. The OTF scheme is applicable to LLMs in a black-box API setting with moderate cost. With ETHICSSUITE, our study on seven popular LLMs (e.g., ChatGPT, GPT-4) uncovers in total 109,824 unethical suggestions. We apply our OTF scheme on two LLMs (Llama-13B and ChatGPT), which generates valid repair to a considerable amount of unethical ones, paving the way for more ethically conscious LLMs.


page 1

page 9


Towards Autonomous Testing Agents via Conversational Large Language Models

Software testing is an important part of the development cycle, yet it r...

Testing Framework for Black-box AI Models

With widespread adoption of AI models for important decision making, ens...

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

We propose VALSE (Vision And Language Structured Evaluation), a novel be...

CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models

We introduce the CRASS (counterfactual reasoning assessment) data set an...

Getting pwn'd by AI: Penetration Testing with Large Language Models

The field of software security testing, more specifically penetration te...

Intramorphic Testing: A New Approach to the Test Oracle Problem

A test oracle determines whether a system behaves correctly for a given ...

Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models

Large language models (LLMs) such as ChatGPT have demonstrated unprecede...

Please sign up or login with your details

Forgot password? Click here to reset