Convolutional Neural Networks for Toxic Comment Classification

02/27/2018
by   Spiros V. Georgakopoulos, et al.
0

Flood of information is produced in a daily basis through the global Internet usage arising from the on-line interactive communications among users. While this situation contributes significantly to the quality of human life, unfortunately it involves enormous dangers, since on-line texts with high toxicity can cause personal attacks, on-line harassment and bullying behaviors. This has triggered both industrial and research community in the last few years while there are several tries to identify an efficient model for on-line toxic comment prediction. However, these steps are still in their infancy and new approaches and frameworks are required. On parallel, the data explosion that appears constantly, makes the construction of new machine learning computational tools for managing this information, an imperative need. Thankfully advances in hardware, cloud computing and big data management allow the development of Deep Learning approaches appearing very promising performance so far. For text classification in particular the use of Convolutional Neural Networks (CNN) have recently been proposed approaching text analytics in a modern manner emphasizing in the structure of words in a document. In this work, we employ this approach to discover toxic comments in a large pool of documents provided by a current Kaggle's competition regarding Wikipedia's talk page edits. To justify this decision we choose to compare CNNs against the traditional bag-of-words approach for text analysis combined with a selection of algorithms proven to be very effective in text classification. The reported results provide enough evidence that CNN enhance toxic comment classification reinforcing research interest towards this direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2023

Gzip versus bag-of-words for text classification with KNN

The effectiveness of compression distance in KNN-based text classificati...
research
09/24/2017

HDLTex: Hierarchical Deep Learning for Text Classification

The continually increasing number of documents produced each year necess...
research
12/19/2019

Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review

Research has shown that Convolutional Neural Networks (CNN) can be effec...
research
04/02/2019

Short Text Classification Improved by Feature Space Extension

With the explosive development of mobile Internet, short text has been a...
research
11/01/2021

Comparative Study of Long Document Classification

The amount of information stored in the form of documents on the interne...
research
12/12/2016

Unraveling reported dreams with text analytics

We investigate what distinguishes reported dreams from other personal na...
research
08/11/2017

Convolutional Neural Networks for Font Classification

Classifying pages or text lines into font categories aids transcription ...

Please sign up or login with your details

Forgot password? Click here to reset