Euphemistic Phrase Detection by Masked Language Model

09/10/2021
by   Wanzheng Zhu, et al.
0

It is a well-known approach for fringe groups and organizations to use euphemisms – ordinary-sounding and innocent-looking words with a secret meaning – to conceal what they are discussing. For instance, drug dealers often use "pot" for marijuana and "avocado" for heroin. From a social media content moderation perspective, though recent advances in NLP have enabled the automatic detection of such single-word euphemisms, no existing work is capable of automatically detecting multi-word euphemisms, such as "blue dream" (marijuana) and "black tar" (heroin). Our paper tackles the problem of euphemistic phrase detection without human effort for the first time, as far as we are aware. We first perform phrase mining on a raw text corpus (e.g., social media posts) to extract quality phrases. Then, we utilize word embedding similarities to select a set of euphemistic phrase candidates. Finally, we rank those candidates by a masked language model – SpanBERT. Compared to strong baselines, we report 20-50 detecting euphemistic phrases.

READ FULL TEXT
research
03/31/2021

Self-Supervised Euphemism Detection and Identification for Content Moderation

Fringe groups and organizations have a long history of using euphemisms–...
research
02/05/2015

Beyond Word-based Language Model in Statistical Machine Translation

Language model is one of the most important modules in statistical machi...
research
09/10/2021

Studying word order through iterative shuffling

As neural language models approach human performance on NLP benchmark ta...
research
04/18/2021

Unsupervised Deep Keyphrase Generation

Keyphrase generation aims to summarize long documents with a collection ...
research
08/10/2015

Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages

Previous studies have shown that health reports in social media, such as...
research
11/14/2018

Neural Based Statement Classification for Biased Language

Biased language commonly occurs around topics which are of controversial...
research
03/10/2017

A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

Compositionality in language refers to how much the meaning of some phra...

Please sign up or login with your details

Forgot password? Click here to reset