Using Word Embeddings to Analyze Protests News

The first two tasks of the CLEF 2019 ProtestNews events focused on distinguishing between protest and non-protest related news articles and sentences in a binary classification task. Among the submissions, two well performing models have been chosen in order to replace the existing word embeddings word2vec and FastTest with ELMo and DistilBERT. Unlike bag of words or earlier vector approaches, ELMo and DistilBERT represent words as a sequence of vectors by capturing the meaning based on contextual information in the text. Without changing the architecture of the original models other than the word embeddings, the implementation of DistilBERT improved the performance measured on the F1-Score of 0.66 compared to the FastText implementation. DistilBERT also outperformed ELMo in both tasks and models. Cleaning the datasets by removing stopwords and lemmatizing the words has been shown to make the models more generalizable across different contexts when training on a dataset with Indian news articles and evaluating the models on a dataset with news articles from China.


page 29

page 30

page 31


Utility of general and specific word embeddings for classifying translational stages of research

Conventional text classification models make a bag-of-words assumption r...

Lost in Space: Geolocation in Event Data

Extracting the "correct" location information from text data, i.e., dete...

Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings

Slanted news coverage, also called media bias, can heavily influence how...

An Empirical Study of Sections in Classifying Disease Outbreak Reports

Identifying articles that relate to infectious diseases is a necessary s...

Learning Joint Acoustic-Phonetic Word Embeddings

Most speech recognition tasks pertain to mapping words across two modali...

MuSeM: Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching

Measuring the congruence between two texts has several useful applicatio...

Word Embeddings for the Construction Domain

We introduce word vectors for the construction domain. Our vectors were ...

Please sign up or login with your details

Forgot password? Click here to reset