Using NLP to measure democracy

02/22/2015
by   Thiago Marzagão, et al.
0

This paper uses natural language processing to create the first machine-coded democracy index, which I call Automated Democracy Scores (ADS). The ADS are based on 42 million news articles from 6,043 different sources and cover all independent countries in the 1993-2012 period. Unlike the democracy indices we have today the ADS are replicable and have standard errors small enough to actually distinguish between cases. The ADS are produced with supervised learning. Three approaches are tried: a) a combination of Latent Semantic Analysis and tree-based regression methods; b) a combination of Latent Dirichlet Allocation and tree-based regression methods; and c) the Wordscores algorithm. The Wordscores algorithm outperforms the alternatives, so it is the one on which the ADS are based. There is a web application where anyone can change the training set and see how the results change: democracy-scores.org

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2023

Word differences in news media of lower and higher peace countries revealed by natural language processing and machine learning

Language is both a cause and a consequence of the social processes that ...
research
09/16/2020

Latent Dirichlet Allocation Models for World Trade Analysis

The international trade is one of the classic areas of study in economic...
research
10/17/2022

Potrika: Raw and Balanced Newspaper Datasets in the Bangla Language with Eight Topics and Five Attributes

Knowledge is central to human and scientific developments. Natural Langu...
research
04/19/2023

Radar de Parité: An NLP system to measure gender representation in French news stories

We present the Radar de Parité, an automated Natural Language Processing...
research
03/08/2023

Automatic Detection of Industry Sectors in Legal Articles Using Machine Learning Approaches

The ability to automatically identify industry sector coverage in articl...
research
08/12/2018

Augmenting word2vec with latent Dirichlet allocation within a clinical application

This paper presents three hybrid models that directly combine latent Dir...
research
09/25/2012

Optimal Weighting of Multi-View Data with Low Dimensional Hidden States

In Natural Language Processing (NLP) tasks, data often has the following...

Please sign up or login with your details

Forgot password? Click here to reset