Distance metric learning based on structural neighborhoods for dimensionality reduction and classification performance improvement
Distance metric learning can be viewed as one of the fundamental interests in pattern recognition and machine learning, which plays a pivotal role in the performance of many learning methods. One of the effective methods in learning such a metric is to learn it from a set of labeled training samples. The issue of data imbalance is the most important challenge of recent methods. This research tries not only to preserve the local structures but also covers the issue of imbalanced datasets. To do this, the proposed method first tries to extract a low dimensional manifold from the input data. Then, it learns the local neighborhood structures and the relationship of the data points in the ambient space based on the adjacencies of the same data points on the embedded low dimensional manifold. Using the local neighborhood relationships extracted from the manifold space, the proposed method learns the distance metric in a way which minimizes the distance between similar data and maximizes their distance from the dissimilar data points. The evaluations of the proposed method on numerous datasets from the UCI repository of machine learning, and also the KDDCup98 dataset as the most imbalance dataset, justify the supremacy of the proposed approach in comparison with other approaches especially when the imbalance factor is high.
READ FULL TEXT