On Measuring Social Biases in Prompt-Based Multi-Task Learning

by   Afra Feyza Akyürek, et al.

Large language models trained on a mixture of NLP tasks that are converted into a text-to-text format using prompts, can generalize into novel forms of language and handle novel tasks. A large body of work within prompt engineering attempts to understand the effects of input forms and prompts in achieving superior performance. We consider an alternative measure and inquire whether the way in which an input is encoded affects social biases promoted in outputs. In this paper, we study T0, a large-scale multi-task text-to-text language model trained using prompt-based learning. We consider two different forms of semantically equivalent inputs: question-answer format and premise-hypothesis format. We use an existing bias benchmark for the former BBQ and create the first bias benchmark in natural language inference BBNLI with hand-written hypotheses while also converting each benchmark into the other form. The results on two benchmarks suggest that given two different formulations of essentially the same input, T0 conspicuously acts more biased in question answering form, which is seen during training, compared to premise-hypothesis form which is unlike its training examples. Code and data are released under https://github.com/feyzaakyurek/bbnli.


page 1

page 2

page 3

page 4


Challenges in Measuring Bias via Open-Ended Language Generation

Researchers have devised numerous ways to quantify social biases vested ...

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Pretrained language models, especially masked language models (MLMs) hav...

MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Recently, large-scale datasets have vastly facilitated the development i...

Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text

While large language models (LLMs), such as GPT-3, appear to be robust a...

BBQ: A Hand-Built Bias Benchmark for Question Answering

It is well documented that NLP models learn social biases present in the...

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

While Large Language Models (LLMs) have demonstrated commendable perform...

What's in your Head? Emergent Behaviour in Multi-Task Transformer Models

The primary paradigm for multi-task training in natural language process...

Please sign up or login with your details

Forgot password? Click here to reset