On the Role of Dataset Quality and Heterogeneity in Model Confidence

02/23/2020
by   Yuan Zhao, et al.
0

Safety-critical applications require machine learning models that output accurate and calibrated probabilities. While uncalibrated deep networks are known to make over-confident predictions, it is unclear how model confidence is impacted by the variations in the data, such as label noise or class size. In this paper, we investigate the role of the dataset quality by studying the impact of dataset size and the label noise on the model confidence. We theoretically explain and experimentally demonstrate that, surprisingly, label noise in the training data leads to under-confident networks, while reduced dataset size leads to over-confident models. We then study the impact of dataset heterogeneity, where data quality varies across classes, on model confidence. We demonstrate that this leads to heterogenous confidence/accuracy behavior in the test data and is poorly handled by the standard calibration algorithms. To overcome this, we propose an intuitive heterogenous calibration technique and show that the proposed approach leads to improved calibration metrics (both average and worst-case errors) on the CIFAR datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning

Deep neural networks (DNN) are prone to miscalibrated predictions, often...
research
09/29/2021

Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration

An increasingly common use case for machine learning models is augmentin...
research
12/12/2020

Normalized Label Distribution: Towards Learning Calibrated, Adaptable and Efficient Activation Maps

The vulnerability of models to data aberrations and adversarial attacks ...
research
11/09/2020

Improving Classifier Confidence using Lossy Label-Invariant Transformations

Providing reliable model uncertainty estimates is imperative to enabling...
research
03/17/2022

Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss

Data-driven methods have achieved notable performance on intent detectio...
research
02/11/2021

When and How Mixup Improves Calibration

In many machine learning applications, it is important for the model to ...
research
11/14/2022

Calibrated Interpretation: Confidence Estimation in Semantic Parsing

Task-oriented semantic parsing is increasingly being used in user-facing...

Please sign up or login with your details

Forgot password? Click here to reset