Fair Coresets and Streaming Algorithms for Fair k-Means Clustering
We study fair clustering problems as proposed by Chierichetti et al. Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which can be used to significantly reduce the input data size. We prove that the coresets are composable and show how to compute them in a streaming setting. We also propose a novel combination of the coreset construction with a sketching technique due to Cohen et al. which may be of independent interest. We conclude with an empirical evaluation.
READ FULL TEXT