A Generalized Framework for Predictive Clustering and Optimization

05/07/2023
by   Aravinth Chembu, et al.
0

Clustering is a powerful and extensively used data science tool. While clustering is generally thought of as an unsupervised learning technique, there are also supervised variations such as Spath's clusterwise regression that attempt to find clusters of data that yield low regression error on a supervised target. We believe that clusterwise regression is just a single vertex of a largely unexplored design space of supervised clustering models. In this article, we define a generalized optimization framework for predictive clustering that admits different cluster definitions (arbitrary point assignment, closest center, and bounding box) and both regression and classification objectives. We then present a joint optimization strategy that exploits mixed-integer linear programming (MILP) for global optimization in this generalized framework. To alleviate scalability concerns for large datasets, we also provide highly scalable greedy algorithms inspired by the Majorization-Minimization (MM) framework. Finally, we demonstrate the ability of our models to uncover different interpretable discrete cluster structures in data by experimenting with four real-world datasets.

READ FULL TEXT
research
12/10/2021

Interpretable Clustering via Multi-Polytope Machines

Clustering is a popular unsupervised learning tool often used to discove...
research
02/06/2023

A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data

We create a mixed-integer optimization (MIO) approach for doing cluster-...
research
07/05/2016

Algorithms for Generalized Cluster-wise Linear Regression

Cluster-wise linear regression (CLR), a clustering problem intertwined w...
research
12/03/2018

Interpretable Clustering via Optimal Trees

State-of-the-art clustering algorithms use heuristics to partition the f...
research
05/22/2017

Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Plain vanilla K-means clustering is prone to produce unbalanced clusters...
research
05/02/2019

Temporal Ordered Clustering in Dynamic Networks

In temporal ordered clustering, given a single snapshot of a dynamic net...
research
07/05/2021

UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning

Subtype Discovery consists in finding interpretable and consistent sub-p...

Please sign up or login with your details

Forgot password? Click here to reset