Nearest Neighbor Classification based on Imbalanced Data: A Statistical Approach

06/22/2022
by   Anvit Garg, et al.
0

In a classification problem, where the competing classes are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, in this article, we develop a statistical method for nearest neighbor classification based on such imbalanced data sets. First, we construct a classifier for the binary classification problem and then extend it for classification problems involving more than two classes. Unlike the existing oversampling methods, our proposed classifiers do not need to generate any pseudo observations, and hence the results are exactly reproducible. We establish the Bayes risk consistency of these classifiers under appropriate regularity conditions. Their superior performance over the exiting methods is amply demonstrated by analyzing several benchmark data sets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset