Making Pre-trained Language Models both Task-solvers and Self-calibrators

by   Yangyi Chen, et al.

Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows that introducing an extra calibration task can mitigate this issue. The basic idea involves acquiring additional data to train models in predicting the confidence of their initial predictions. However, it only demonstrates the feasibility of this kind of method, assuming that there are abundant extra available samples for the introduced calibration task. In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and self-calibrators. Three challenges are presented, including limited training samples, data imbalance, and distribution shifts. We first conduct pilot experiments to quantify various decisive factors in the calibration task. Based on the empirical analysis results, we propose a training algorithm LM-TOAST to tackle the challenges. Experimental results show that LM-TOAST can effectively utilize the training data to make PLMs have reasonable confidence estimations while maintaining the original task performance. Further, we consider three downstream applications, namely selective classification, adversarial defense, and model cascading, to show the practical usefulness of LM-TOAST. The code will be made public at <>.


page 1

page 2

page 3

page 4


A Close Look into the Calibration of Pre-trained Language Models

Pre-trained language models (PLMs) achieve remarkable performance on man...

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

Proper confidence calibration of deep neural networks is essential for r...

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

Large pre-trained language models achieve impressive results across many...

A Data Cartography based MixUp for Pre-trained Language Models

MixUp is a data augmentation strategy where additional samples are gener...

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

Fine-tuned pre-trained language models can suffer from severe miscalibra...

Multi-Task Learning based Online Dialogic Instruction Detection with Pre-trained Language Models

In this work, we study computational approaches to detect online dialogi...

Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection

Deep neural networks (DNNs) have enabled astounding progress in several ...

Please sign up or login with your details

Forgot password? Click here to reset