The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

by   Daniel Shwartz, et al.

Deep neural networks have incredible capacity and expressibility, and can seemingly memorize any training set. This introduces a problem when training in the presence of noisy labels, as the noisy examples cannot be distinguished from clean examples by the end of training. Recent research has dealt with this challenge by utilizing the fact that deep networks seem to memorize clean examples much earlier than noisy examples. Here we report a new empirical result: for each example, when looking at the time it has been memorized by each model in an ensemble of networks, the diversity seen in noisy examples is much larger than the clean examples. We use this observation to develop a new method for noisy labels filtration. The method is based on a statistics of the data, which captures the differences in ensemble learning dynamics between clean and noisy data. We test our method on three tasks: (i) noise amount estimation; (ii) noise filtration; (iii) supervised classification. We show that our method improves over existing baselines in all three tasks using a variety of datasets, noise models, and noise levels. Aside from its improved performance, our method has two other advantages. (i) Simplicity, which implies that no additional hyperparameters are introduced. (ii) Our method is modular: it does not work in an end-to-end fashion, and can therefore be used to clean a dataset for any other future usage.


page 17

page 19


Leveraging inductive bias of neural networks for learning without explicit human annotations

Classification problems today are typically solved by first collecting e...

Deep Learning is Robust to Massive Label Noise

Deep neural networks trained on large supervised datasets have led to im...

Robust Training with Ensemble Consensus

Since deep neural networks are over-parametrized, they may memorize nois...

Graph convolutional networks for learning with few clean and many noisy labels

In this work we consider the problem of learning a classifier from noisy...

Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data

We investigate the problem of machine learning with mislabeled training ...

Early-Learning Regularization Prevents Memorization of Noisy Labels

We propose a novel framework to perform classification via deep learning...

INN: A Method Identifying Clean-annotated Samples via Consistency Effect in Deep Neural Networks

In many classification problems, collecting massive clean-annotated data...

Please sign up or login with your details

Forgot password? Click here to reset