Coresets for Clustering with Fairness Constraints

by   Lingxiao Huang, et al.

In a recent work, Chierichetti et al. studied the following "fair" variants of classical clustering problems such as k-means and k-median: given a set of n data points in R^d and a binary type associated to each data point, the goal is to cluster the points while ensuring that the proportion of each type in each cluster is roughly the same as its underlying proportion. Subsequent work has focused on either extending this setting to when each data point has multiple, non-disjoint sensitive types such as race and gender, or to address the problem that the clustering algorithms in the above work do not scale well. The main contribution of this paper is an approach to clustering with fairness constraints that involve multiple, non-disjoint types, that is also scalable. Our approach is based on novel constructions of coresets: for the k-median objective, we construct an ε-coreset of size O(Γ k^2 ε^-d) where Γ is the number of distinct collections of groups that a point may belong to, and for the k-means objective, we show how to construct an ε-coreset of size O(Γ k^3ε^-d-1). The former result is the first known coreset construction for the fair clustering problem with the k-median objective, and the latter result removes the dependence on the size of the full dataset as in Schmidt et al. and generalizes it to multiple, non-disjoint types. Plugging our coresets into existing algorithms for fair clustering such as Backurs et al. results in the fastest algorithms for several cases. Empirically, we assess our approach over the Adult and Bank dataset, and show that the coreset sizes are much smaller than the full dataset; applying coresets indeed accelerates the running time of computing the fair clustering objective while ensuring that the resulting objective difference is small.


page 1

page 2

page 3

page 4


Scalable Fair Clustering

We study the fair variant of the classic k-median problem introduced by ...

Fair Algorithms for Clustering

We study clustering problems under the lens of algorithmic fairness ins...

Coresets for Clustering with Missing Values

We provide the first coreset for clustering points in ℝ^d that have mult...

Coresets for Clustering with General Assignment Constraints

Designing small-sized coresets, which approximately preserve the costs o...

Better Algorithms for Individually Fair k-Clustering

We study data clustering problems with ℓ_p-norm objectives (e.g. k-Media...

Noisy Voronoi: a Simple Framework for Terminal-Clustering Problems

We reprove three known (algorithmic) bounds for terminal-clustering prob...

Relaxed Voronoi: a Simple Framework for Terminal-Clustering Problems

We reprove three known algorithmic bounds for terminal-clustering proble...

Please sign up or login with your details

Forgot password? Click here to reset