Mitigating Transformer Overconfidence via Lipschitz Regularization

06/12/2023
by   Wenqian Ye, et al.
0

Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.

READ FULL TEXT
research
06/08/2020

The Lipschitz Constant of Self-Attention

Lipschitz constants of neural networks have been explored in various con...
research
05/30/2021

Gaze Estimation using Transformer

Recent work has proven the effectiveness of transformers in many compute...
research
06/30/2021

Augmented Shortcuts for Vision Transformers

Transformer models have achieved great progress on computer vision tasks...
research
06/11/2023

E(2)-Equivariant Vision Transformer

Vision Transformer (ViT) has achieved remarkable performance in computer...
research
02/15/2023

Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

There has been a recent surge of interest in introducing transformers to...
research
01/19/2023

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

Generative transformer models have become increasingly complex, with lar...

Please sign up or login with your details

Forgot password? Click here to reset