Nonparametric semi-supervised learning of class proportions

by   Shantanu Jain, et al.

The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples. In this work we primarily focus on the latter subproblem. We study nonparametric class prior estimation and formulate this problem as an estimation of mixing proportions in two-component mixture models, given a sample from one of the components and another sample from the mixture itself. We show that estimation of mixing proportions is generally ill-defined and propose a canonical form to obtain identifiability while maintaining the flexibility to model any distribution. We use insights from this theory to elucidate the optimization surface of the class priors and propose an algorithm for estimating them. To address the problems of high-dimensional density estimation, we provide practical transformations to low-dimensional spaces that preserve class priors. Finally, we demonstrate the efficacy of our method on univariate and multivariate data.


page 1

page 2

page 3

page 4


Estimating the class prior and posterior from noisy positives and unlabeled data

We develop a classification algorithm for estimating posterior distribut...

DEDPUL: Method for Mixture Proportion Estimation and Positive-Unlabeled Classification based on Density Estimation

This paper studies Positive-Unlabeled Classification, the problem of sem...

Recovering True Classifier Performance in Positive-Unlabeled Learning

A common approach in positive-unlabeled learning is to train a classific...

Mixture Proportion Estimation and PU Learning: A Modern Approach

Given only positive examples and unlabeled examples (from both positive ...

Class-prior Estimation for Learning from Positive and Unlabeled Data

We consider the problem of estimating the class prior in an unlabeled da...

Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning

In PU learning, a binary classifier is trained from positive (P) and unl...

Improved Estimation of Class Prior Probabilities through Unlabeled Data

Work in the classification literature has shown that in computing a clas...

Please sign up or login with your details

Forgot password? Click here to reset