ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

08/05/2023
by   Fangyuan Wang, et al.
0

The conventional recipe for Automatic Speech Recognition (ASR) models is to 1) train multiple checkpoints on a training set while relying on a validation set to prevent overfitting using early stopping and 2) average several last checkpoints or that of the lowest validation losses to obtain the final model. In this paper, we rethink and update the early stopping and checkpoint averaging from the perspective of the bias-variance tradeoff. Theoretically, the bias and variance represent the fitness and variability of a model and the tradeoff of them determines the overall generalization error. But, it's impractical to evaluate them precisely. As an alternative, we take the training loss and validation loss as proxies of bias and variance and guide the early stopping and checkpoint averaging using their tradeoff, namely an Approximated Bias-Variance Tradeoff (ApproBiVT). When evaluating with advanced ASR models, our recipe provides 2.5 AISHELL-2, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2017

Early Stopping without a Validation Set

Early stopping is a widely used technique to prevent poor generalization...
research
12/19/2013

Approximated Infomax Early Stopping: Revisiting Gaussian RBMs on Natural Images

We pursue an early stopping technique that helps Gaussian Restricted Bol...
research
07/20/2020

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Over-parameterized models, in particular deep networks, often exhibit a ...
research
12/17/2019

On the Bias-Variance Tradeoff: Textbooks Need an Update

The main goal of this thesis is to point out that the bias-variance trad...
research
08/19/2022

Intersection of Parallels as an Early Stopping Criterion

A common way to avoid overfitting in supervised learning is early stoppi...
research
02/08/2022

Understanding the bias-variance tradeoff of Bregman divergences

This paper builds upon the work of Pfau (2013), which generalized the bi...
research
03/10/2023

Tradeoff of generalization error in unsupervised learning

Finding the optimal model complexity that minimizes the generalization e...

Please sign up or login with your details

Forgot password? Click here to reset