Large Language Models Enable Few-Shot Clustering

07/02/2023
by   Vijay Viswanathan, et al.
0

Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user's intent. Existing approaches to semi-supervised clustering require a significant amount of feedback from an expert to improve the clusters. In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering. We show that LLMs are surprisingly effective at improving clustering. We explore three stages where LLMs can be incorporated into clustering: before clustering (improving input features), during clustering (by providing constraints to the clusterer), and after clustering (using LLMs post-correction). We find incorporating LLMs in the first two stages can routinely provide significant improvements in cluster quality, and that LLMs enable a user to make trade-offs between cost and accuracy to produce desired clusters. We release our code and LLM prompts for the public to use.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2017

Semi-supervised model-based clustering with controlled clusters leakage

In this paper, we focus on finding clusters in partially categorized dat...
research
08/03/2016

Improving Quality of Hierarchical Clustering for Large Data Series

Brown clustering is a hard, hierarchical, bottom-up clustering of words ...
research
05/24/2023

ClusterLLM: Large Language Models as a Guide for Text Clustering

We introduce ClusterLLM, a novel text clustering framework that leverage...
research
04/16/2023

USNID: A Framework for Unsupervised and Semi-supervised New Intent Discovery

New intent discovery is of great value to natural language processing, a...
research
11/20/2017

Relaxed Oracles for Semi-Supervised Clustering

Pairwise "same-cluster" queries are one of the most widely used forms of...
research
10/20/2019

Identification of Interaction Clusters Using a Semi-supervised Hierarchical Clustering Method

Motivation: Identifying interaction clusters of large gene regulatory ne...
research
12/24/2014

An Effective Semi-supervised Divisive Clustering Algorithm

Nowadays, data are generated massively and rapidly from scientific field...

Please sign up or login with your details

Forgot password? Click here to reset