Embedding Compression for Text Classification Using Dictionary Screening

11/23/2022
by   Jing Zhou, et al.
0

In this paper, we propose a dictionary screening method for embedding compression in text classification tasks. The key purpose of this method is to evaluate the importance of each keyword in the dictionary. To this end, we first train a pre-specified recurrent neural network-based model using a full dictionary. This leads to a benchmark model, which we then use to obtain the predicted class probabilities for each sample in a dataset. Next, to evaluate the impact of each keyword in affecting the predicted class probabilities, we develop a novel method for assessing the importance of each keyword in a dictionary. Consequently, each keyword can be screened, and only the most important keywords are reserved. With these screened keywords, a new dictionary with a considerably reduced size can be constructed. Accordingly, the original text sequence can be substantially compressed. The proposed method leads to significant reductions in terms of parameters, average text sequence, and dictionary size. Meanwhile, the prediction power remains very competitive compared to the benchmark model. Extensive numerical studies are presented to demonstrate the empirical performance of the proposed method.

READ FULL TEXT
research
12/30/2015

Online Keyword Spotting with a Character-Level Recurrent Neural Network

In this paper, we propose a context-aware keyword spotting model employi...
research
10/06/2021

Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised text classification has received much attention in rec...
research
03/14/2023

Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening

Under the flourishing development in performance, current image-text ret...
research
12/11/2022

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using o...
research
06/24/2016

Interactive Semantic Featuring for Text Classification

In text classification, dictionaries can be used to define human-compreh...
research
07/11/2020

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

It has been proved that deep neural networks are facing a new threat cal...
research
05/10/2021

Measuring Economic Policy Uncertainty Using an Unsupervised Word Embedding-based Method

Economic Policy Uncertainty (EPU) is a critical indicator in economic st...

Please sign up or login with your details

Forgot password? Click here to reset