RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

by   Honghui Mei, et al.

Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.


page 1

page 5

page 9


Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling

A knowledge graph (KG) manages large-scale and real-world facts as a big...

Dataopsy: Scalable and Fluid Visual Exploration using Aggregate Query Sculpting

We present aggregate query sculpting (AQS), a faceted visual query techn...

Set-to-Set Hashing with Applications in Visual Recognition

Visual data, such as an image or a sequence of video frames, is often na...

Compact Representations of Event Sequences

We introduce a new technique for the efficient management of large seque...

Leveraging Schema Labels to Enhance Dataset Search

A search engine's ability to retrieve desirable datasets is important fo...

ShapeSearch: A Flexible and Efficient System for Shape-based Exploration of Trendlines

Identifying trendline visualizations with desired patterns is a common a...

PSEUDo: Interactive Pattern Search in Multivariate Time Series with Locality-Sensitive Hashing and Relevance Feedback

We present PSEUDo, an adaptive feature learning technique for exploring ...

Please sign up or login with your details

Forgot password? Click here to reset