Improving State-of-the-Art in One-Class Classification by Leveraging Unlabeled Data

by   Farid Bagirov, et al.

When dealing with binary classification of data with only one labeled class data scientists employ two main approaches, namely One-Class (OC) classification and Positive Unlabeled (PU) learning. The former only learns from labeled positive data, whereas the latter also utilizes unlabeled data to improve the overall performance. Since PU learning utilizes more data, we might be prone to think that when unlabeled data is available, the go-to algorithms should always come from the PU group. However, we find that this is not always the case if unlabeled data is unreliable, i.e. contains limited or biased latent negative data. We perform an extensive experimental study of a wide list of state-of-the-art OC and PU algorithms in various scenarios as far as unlabeled data reliability is concerned. Furthermore, we propose PU modifications of state-of-the-art OC algorithms that are robust to unreliable unlabeled data, as well as a guideline to similarly modify other OC algorithms. Our main practical recommendation is to use state-of-the-art PU algorithms when unlabeled data is reliable and to use the proposed modifications of state-of-the-art OC algorithms otherwise. Additionally, we outline procedures to distinguish the cases of reliable and unreliable unlabeled data using statistical tests.


page 3

page 15

page 19


A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels

Positive-unlabeled learning refers to the process of training a binary c...

Bringing Giant Neural Networks Down to Earth with Unlabeled Data

Compressing giant neural networks has gained much attention for their ex...

Estimating the class prior and posterior from noisy positives and unlabeled data

We develop a classification algorithm for estimating posterior distribut...

Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for Chronic Disease Prediction

Positive-Unlabeled (PU) Learning is a challenge presented by binary clas...

Confusion Matrices and Accuracy Statistics for Binary Classifiers Using Unlabeled Data: The Diagnostic Test Approach

Medical researchers have solved the problem of estimating the sensitivit...

Class-prior Estimation for Learning from Positive and Unlabeled Data

We consider the problem of estimating the class prior in an unlabeled da...

A Boosting Algorithm for Positive-Unlabeled Learning

Positive-unlabeled (PU) learning deals with binary classification proble...

Please sign up or login with your details

Forgot password? Click here to reset