Label Smoothing is Robustification against Model Misspecification

05/15/2023
by   Ryoya Yamasaki, et al.
0

Label smoothing (LS) adopts smoothed targets in classification tasks. For example, in binary classification, instead of the one-hot target (1,0)^⊤ used in conventional logistic regression (LR), LR with LS (LSLR) uses the smoothed target (1-α/2,α/2)^⊤ with a smoothing level α∈(0,1), which causes squeezing of values of the logit. Apart from the common regularization-based interpretation of LS that leads to an inconsistent probability estimator, we regard LSLR as modifying the loss function and consistent estimator for probability estimation. In order to study the significance of each of these two modifications by LSLR, we introduce a modified LSLR (MLSLR) that uses the same loss function as LSLR and the same consistent estimator as LR, while not squeezing the logits. For the loss function modification, we theoretically show that MLSLR with a larger smoothing level has lower efficiency with correctly-specified models, while it exhibits higher robustness against model misspecification than LR. Also, for the modification of the probability estimator, an experimental comparison between LSLR and MLSLR showed that this modification and squeezing of the logits in LSLR have negative effects on the probability estimation and classification performance. The understanding of the properties of LS provided by these comparisons allows us to propose MLSLR as an improvement over LSLR.

READ FULL TEXT
research
02/12/2019

A Tunable Loss Function for Binary Classification

We present α-loss, α∈ [1,∞], a tunable loss function for binary classifi...
research
10/13/2017

On Integrated L^1 Convergence Rate of an Isotonic Regression Estimator for Multivariate Observations

We consider a general monotone regression estimation where we allow for ...
research
07/16/2019

The Bregman-Tweedie Classification Model

This work proposes the Bregman-Tweedie classification model and analyzes...
research
10/22/2022

Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

Overconfidence has been shown to impair generalization and calibration o...
research
02/15/2021

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

Modern machine learning models with high accuracy are often miscalibrate...
research
07/23/2021

Similarity Based Label Smoothing For Dialogue Generation

Generative neural conversational systems are generally trained with the ...
research
03/04/2015

Class Probability Estimation via Differential Geometric Regularization

We study the problem of supervised learning for both binary and multicla...

Please sign up or login with your details

Forgot password? Click here to reset