Rank-Aware Negative Training for Semi-Supervised Text Classification

by   Ahmed Murtadha, et al.

Semi-supervised text classification-based paradigms (SSTC) typically employ the spirit of self-training. The key idea is to train a deep classifier on limited labeled texts and then iteratively predict the unlabeled texts as their pseudo-labels for further training. However, the performance is largely affected by the accuracy of pseudo-labels, which may not be significant in real-world scenarios. This paper presents a Rank-aware Negative Training (RNT) framework to address SSTC in learning with noisy label manner. To alleviate the noisy information, we adapt a reasoning with uncertainty-based approach to rank the unlabeled texts based on the evidential support received from the labeled texts. Moreover, we propose the use of negative training to train RNT based on the concept that “the input instance does not belong to the complementary label”. A complementary label is randomly selected from all labels except the label on-target. Intuitively, the probability of a true label serving as a complementary label is low and thus provides less noisy information during the training, resulting in better performance on the test data. Finally, we evaluate the proposed solution on various text classification benchmark datasets. Our extensive experiments show that it consistently overcomes the state-of-the-art alternatives in most scenarios and achieves competitive performance in the others. The code of RNT is publicly available at:https://github.com/amurtadha/RNT.


page 1

page 2

page 3

page 4


SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training

Self-training methods have been explored in recent years and have exhibi...

NLNL: Negative Learning for Noisy Labels

Convolutional Neural Networks (CNNs) provide excellent performance when ...

Learning with Partial Labels from Semi-supervised Perspective

Partial Label (PL) learning refers to the task of learning from the part...

Using Deep Learning For Title-Based Semantic Subject Indexing To Reach Competitive Performance to Full-Text

For (semi-)automated subject indexing systems in digital libraries, it i...

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

Self-training provides an effective means of using an extremely small am...

Uncertainty-aware Self-training for Text Classification with Few Labels

Recent success of large-scale pre-trained language models crucially hing...

CLCIFAR: CIFAR-Derived Benchmark Datasets with Human Annotated Complementary Labels

As a weakly-supervised learning paradigm, complementary label learning (...

Please sign up or login with your details

Forgot password? Click here to reset