To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

10/06/2022
by   Julius Gonsior, et al.
0

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-based language models still requires a significant amount of labeled data to work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is Active Learning (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final layer. As the softmax function provides misleading probabilities, this paper compares eight alternatives on seven datasets. Our almost paradoxical finding is that most of the methods are too good at identifying the true most uncertain samples (outliers), and that labeling therefore exclusively outliers results in worse performance. As a heuristic we propose to systematically ignore samples, which results in improvements of various methods compared to the softmax function.

READ FULL TEXT

page 8

page 9

page 11

research
04/16/2021

Bayesian Active Learning with Pretrained Language Models

Active Learning (AL) is a method to iteratively select data for annotati...
research
07/12/2021

Uncertainty-based Query Strategies for Active Learning with Transformers

Active learning is the iterative construction of a classification model ...
research
09/26/2021

Improving Question Answering Performance Using Knowledge Distillation and Active Learning

Contemporary question answering (QA) systems, including transformer-base...
research
05/16/2023

On Dataset Transferability in Active Learning for Transformers

Active learning (AL) aims to reduce labeling costs by querying the examp...
research
04/11/2023

r-softmax: Generalized Softmax with Controllable Sparsity Rate

Nowadays artificial neural network models achieve remarkable results in ...
research
06/22/2023

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Transformer models have been widely adopted in various domains over the ...
research
05/02/2022

Simple Techniques Work Surprisingly Well for Neural Network Test Prioritization and Active Learning (Replicability Study)

Test Input Prioritizers (TIP) for Deep Neural Networks (DNN) are an impo...

Please sign up or login with your details

Forgot password? Click here to reset