Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation

04/20/2021
by   Yiping Jin, et al.
0

Weakly-supervised text classification aims to induce text classifiers from only a few user-provided seed words. The vast majority of previous work assumes high-quality seed words are given. However, the expert-annotated seed words are sometimes non-trivial to come up with. Furthermore, in the weakly-supervised learning setting, we do not have any labeled document to measure the seed words' efficacy, making the seed word selection process "a walk in the dark". In this work, we remove the need for expert-curated seed words by first mining (noisy) candidate seed words associated with the category names. We then train interim models with individual candidate seed words. Lastly, we estimate the interim models' error rate in an unsupervised manner. The seed words that yield the lowest estimated error rates are added to the final seed word set. A comprehensive evaluation of six binary classification tasks on four popular datasets demonstrates that the proposed method outperforms a baseline using only category name seed words and obtained comparable performance as a counterpart using expert-annotated seed words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2021

Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals

Dataless text classification, i.e., a new paradigm of weakly supervised ...
research
10/13/2022

LIME: Weakly-Supervised Text Classification Without Seeds

In weakly-supervised text classification, only label names act as source...
research
12/16/2021

Hyperbolic Disentangled Representation for Fine-Grained Aspect Extraction

Automatic identification of salient aspects from user reviews is especia...
research
05/24/2023

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Recent advances in weakly supervised text classification mostly focus on...
research
09/01/2019

Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training

User-generated reviews can be decomposed into fine-grained segments (e.g...
research
05/22/2023

A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

Etremely Weakly Supervised Text Classification (XWS-TC) refers to text c...
research
10/24/2020

X-Class: Text Classification with Extremely Weak Supervision

In this paper, we explore to conduct text classification with extremely ...

Please sign up or login with your details

Forgot password? Click here to reset