Danish Stance Classification and Rumour Resolution

07/02/2019
by   Anders Edelbo Lillie, et al.
0

The Internet is rife with flourishing rumours that spread through microblogs and social media. Recent work has shown that analysing the stance of the crowd towards a rumour is a good indicator for its veracity. One state-of-the-art system uses an LSTM neural network to automatically classify stance for posts on Twitter by considering the context of a whole branch, while another, more simple Decision Tree classifier, performs at least as well by performing careful feature engineering. One approach to predict the veracity of a rumour is to use stance as the only feature for a Hidden Markov Model (HMM). This thesis generates a stance-annotated Reddit dataset for the Danish language, and implements various models for stance classification. Out of these, a Linear Support Vector Machine provides the best results with an accuracy of 0.76 and macro F1 score of 0.42. Furthermore, experiments show that stance labels can be used across languages and platforms with a HMM to predict the veracity of rumours, achieving an accuracy of 0.82 and F1 score of 0.67. Even higher scores are achieved by relying only on the Danish dataset. In this case veracity prediction scores an accuracy of 0.83 and an F1 of 0.68. Finally, when using automatic stance labels for the HMM, only a small drop in performance is observed, showing that the implemented system can have practical applications.

READ FULL TEXT
research
08/13/2019

Offensive Language and Hate Speech Detection for Danish

The presence of offensive language on social media platforms and the imp...
research
02/08/2014

Thresholding Classifiers to Maximize F1 Score

This paper provides new insight into maximizing F1 scores in the context...
research
07/20/2020

Voice@SRIB at SemEval-2020 Task [9,12]: Sentiment and Offensiveness detection in Social Media

In social-media platforms such as Twitter, Facebook, and Reddit, people ...
research
11/22/2022

Coreference Resolution through a seq2seq Transition-Based System

Most recent coreference resolution systems use search algorithms over po...
research
06/13/2021

SASICM A Multi-Task Benchmark For Subtext Recognition

Subtext is a kind of deep semantics which can be acquired after one or m...
research
06/30/2021

Efficient Detection of Botnet Traffic by features selection and Decision Trees

Botnets are one of the online threats with the biggest presence, causing...
research
09/01/2019

Monitoring stance towards vaccination in Twitter messages

We developed a system to automatically classify stance towards vaccinati...

Please sign up or login with your details

Forgot password? Click here to reset