Pathologies of Pre-trained Language Models in Few-shot Fine-tuning

04/17/2022
by   Hanjie Chen, et al.
0

Although adapting pre-trained language models with few examples has shown promising performance on text classification, there is a lack of understanding of where the performance gain comes from. In this work, we propose to answer this question by interpreting the adaptation behavior using post-hoc explanations from model predictions. By modeling feature statistics of explanations, we discover that (1) without fine-tuning, pre-trained models (e.g. BERT and RoBERTa) show strong prediction bias across labels; (2) although few-shot fine-tuning can mitigate the prediction bias and demonstrate promising prediction performance, our analysis shows models gain performance improvement by capturing non-task-related features (e.g. stop words) or shallow data patterns (e.g. lexical overlaps). These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior, which requires further sanity check on model predictions and careful design in model evaluations in few-shot fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2021

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

We show that with small-to-medium training data, fine-tuning only the bi...
research
05/22/2023

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

Explaining the decisions of neural models is crucial for ensuring their ...
research
05/11/2022

Making Pre-trained Language Models Good Long-tailed Learners

Prompt-tuning has shown appealing performance in few-shot classification...
research
03/20/2022

Cluster Tune: Boost Cold Start Performance in Text Classification

In real-world scenarios, a text classification task often begins with a ...
research
03/16/2022

Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again

The strong few-shot in-context learning capability of large pre-trained ...
research
09/08/2022

IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

In this paper, we describe our participation in the subtask 1 of CASE-20...
research
05/31/2023

Exploring Lottery Prompts for Pre-trained Language Models

Consistently scaling pre-trained language models (PLMs) imposes substant...

Please sign up or login with your details

Forgot password? Click here to reset