Coresets for Clustering in Excluded-minor Graphs and Beyond

by   Vladimir Braverman, et al.

Coresets are modern data-reduction tools that are widely used in data analysis to improve efficiency in terms of running time, space and communication complexity. Our main result is a fast algorithm to construct a small coreset for k-Median in (the shortest-path metric of) an excluded-minor graph. Specifically, we give the first coreset of size that depends only on k, ϵ and the excluded-minor size, and our running time is quasi-linear (in the size of the input graph). The main innovation in our new algorithm is that is iterative; it first reduces the n input points to roughly O(log n) reweighted points, then to O(loglog n), and so forth until the size is independent of n. Each step in this iterative size reduction is based on the importance sampling framework of Feldman and Langberg (STOC 2011), with a crucial adaptation that reduces the number of distinct points, by employing a terminal embedding (where low distortion is guaranteed only for the distance from every terminal to all other points). Our terminal embedding is technically involved and relies on shortest-path separators, a standard tool in planar and excluded-minor graphs. Furthermore, our new algorithm is applicable also in Euclidean metrics, by simply using a recent terminal embedding result of Narayanan and Nelson, (STOC 2019), which extends the Johnson-Lindenstrauss Lemma. We thus obtain an efficient coreset construction in high-dimensional Euclidean spaces, thereby matching and simplifying state-of-the-art results (Sohler and Woodruff, FOCS 2018; Huang and Vishnoi, STOC 2020). In addition, we also employ terminal embedding with additive distortion to obtain small coresets in graphs with bounded highway dimension, and use applications of our coresets to obtain improved approximation schemes, e.g., an improved PTAS for planar k-Median via a new centroid set.


page 1

page 2

page 3

page 4


Low Treewidth Embeddings of Planar and Minor-Free Metrics

Cohen-Addad, Filtser, Klein and Le [FOCS'20] constructed a stochastic em...

Coresets for Clustering in Graphs of Bounded Treewidth

We initiate the study of coresets for clustering in graph metrics, i.e.,...

The Power of Uniform Sampling for Coresets

Motivated by practical generalizations of the classic k-median and k-mea...

On Light Spanners, Low-treewidth Embeddings and Efficient Traversing in Minor-free Graphs

Understanding the structure of minor-free metrics, namely shortest path ...

On Efficient Low Distortion Ultrametric Embedding

A classic problem in unsupervised learning and data analysis is to find ...

Euclidean TSP in Narrow Strip

We investigate how the complexity of Euclidean TSP for point sets P insi...

Coresets for Clustering in Geometric Intersection Graphs

Designing coresets–small-space sketches of the data preserving cost of t...

Please sign up or login with your details

Forgot password? Click here to reset