Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

05/22/2017
by   Napat Rujeerapaiboon, et al.
0

Plain vanilla K-means clustering is prone to produce unbalanced clusters and suffers from outlier sensitivity. To mitigate both shortcomings, we formulate a joint outlier detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset. We cast this problem as a mixed-integer linear program (MILP) that admits tractable semidefinite and linear programming relaxations. We propose deterministic rounding schemes that transform the relaxed solutions to feasible solutions for the MILP. We also prove that these solutions are optimal in the MILP if a cluster separation condition holds.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset