Evaluation of Sampling Methods for Scatterplots

by   Jun Yuan, et al.

Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but "good" scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.


page 1

page 3

page 5


UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification

In this paper, we propose a new RWO-Sampling (Random Walk Over-Sampling)...

DOS: Diverse Outlier Sampling for Out-of-Distribution Detection

Modern neural networks are known to give overconfident prediction for ou...

Analysis of Sampling Strategies for Implicit 3D Reconstruction

In the training process of the implicit 3D reconstruction network, the c...

On Classification from Outlier View

Classification is the basis of cognition. Unlike other solutions, this s...

Signal Recovery on Graphs: Random versus Experimentally Designed Sampling

We study signal recovery on graphs based on two sampling strategies: ran...

A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data

This dissertation investigates the use of one-sided classification algor...

Universal Smoothed Score Functions for Generative Modeling

We consider the problem of generative modeling based on smoothing an unk...

Please sign up or login with your details

Forgot password? Click here to reset