Representative Selection for Big Data via Sparse Graph and Geodesic Grassmann Manifold Distance

by   Chinh Dang, et al.

This paper addresses the problem of identifying a very small subset of data points that belong to a significantly larger massive dataset (i.e., Big Data). The small number of selected data points must adequately represent and faithfully characterize the massive Big Data. Such identification process is known as representative selection [19]. We propose a novel representative selection framework by generating an l1 norm sparse graph for a given Big-Data dataset. The Big Data is partitioned recursively into clusters using a spectral clustering algorithm on the generated sparse graph. We consider each cluster as one point in a Grassmann manifold, and measure the geodesic distance among these points. The distances are further analyzed using a min-max algorithm [1] to extract an optimal subset of clusters. Finally, by considering a sparse subgraph of each selected cluster, we detect a representative using principal component centrality [11]. We refer to the proposed representative selection framework as a Sparse Graph and Grassmann Manifold (SGGM) based approach. To validate the proposed SGGM framework, we apply it onto the problem of video summarization where only few video frames, known as key frames, are selected among a much longer video sequence. A comparison of the results obtained by the proposed algorithm with the ground truth, which is agreed by multiple human judges, and with some state-of-the-art methods clearly indicates the viability of the SGGM framework.


Manifold Learning and Deep Clustering with Local Dictionaries

We introduce a novel clustering algorithm for data sampled from a union ...

IPD:An Incremental Prototype based DBSCAN for large-scale data with cluster representatives

DBSCAN is a fundamental density-based clustering technique that identifi...

Rethinking k-means from manifold learning perspective

Although numerous clustering algorithms have been developed, many existi...

Representative Selection in Non Metric Datasets

This paper considers the problem of representative selection: choosing a...

A Quantum Annealing-Based Approach to Extreme Clustering

In this age of data abundance, there is a growing need for algorithms an...

Self-Representation Based Unsupervised Exemplar Selection in a Union of Subspaces

Finding a small set of representatives from an unlabeled dataset is a co...

Scalability and robustness of spectral embedding: landmark diffusion is all you need

While spectral embedding is a widely applied dimension reduction techniq...

Please sign up or login with your details

Forgot password? Click here to reset