Learned k-NN Distance Estimation

08/29/2022
by   Daichi Amagata, et al.
0

Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life applications. In such analysis, the distances to k nearest neighbors are usually employed, thus its main bottleneck is derived from data retrieval. Much efforts have been made to improve the efficiency of these analyses. However, they still incur large costs, because they essentially need many data accesses. To avoid this issue, we propose a machine-learning technique that quickly and accurately estimates the k-NN distances (i.e., distances to the k nearest neighbors) of a given query. We train a fully connected neural network model and utilize pivots to achieve accurate estimation. Our model is designed to have useful advantages: it infers distances to the k-NNs at a time, its inference time is O(1) (no data accesses are incurred), but it keeps high accuracy. Our experimental results and case studies on real datasets demonstrate the efficiency and effectiveness of our solution.

READ FULL TEXT
research
06/01/2018

k-nearest neighbors prediction and classification for spatial data

We propose a nonparametric predictor and a supervised classification bas...
research
10/11/2018

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

In the k-nearest neighborhood model (k-NN), we are given a set of points...
research
11/29/2015

k-Nearest Neighbour Classification of Datasets with a Family of Distances

The k-nearest neighbour (k-NN) classifier is one of the oldest and most ...
research
05/28/2015

An Analogy Based Method for Freight Forwarding Cost Estimation

The author explored estimation by analogy (EBA) as a means of estimating...
research
03/28/2016

Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation

Estimating entropy and mutual information consistently is important for ...
research
03/05/2020

Fuzzy k-Nearest Neighbors with monotonicity constraints: Moving towards the robustness of monotonic noise

This paper proposes a new model based on Fuzzy k-Nearest Neighbors for c...
research
12/28/2010

DD-EbA: An algorithm for determining the number of neighbors in cost estimation by analogy using distance distributions

Case Based Reasoning and particularly Estimation by Analogy, has been us...

Please sign up or login with your details

Forgot password? Click here to reset