Counting Protests in News Articles: A Dataset and Semi-Automated Data Collection Pipeline

by   Tommy Leung, et al.

Between January 2017 and January 2021, thousands of local news sources in the United States reported on over 42,000 protests about topics such as civil rights, immigration, guns, and the environment. Given the vast number of local journalists that report on protests daily, extracting these events as structured data to understand temporal and geographic trends can empower civic decision-making. However, the task of extracting events from news articles presents well known challenges to the NLP community in the fields of domain detection, slot filling, and coreference resolution. To help improve the resources available for extracting structured data from news stories, our contribution is three-fold. We 1) release a manually labeled dataset of news article URLs, dates, locations, crowd size estimates, and 494 discrete descriptive tags corresponding to 42,347 reported protest events in the United States between January 2017 and January 2021; 2) describe the semi-automated data collection pipeline used to discover, sort, and review the 144,568 English articles that comprise the dataset; and 3) benchmark a long-short term memory (LSTM) low dimensional classifier that demonstrates the utility of processing news articles based on syntactic structures, such as paragraphs and sentences, to count the number of reported protest events.


page 1

page 2

page 3

page 4


NELA-Local: A Dataset of U.S. Local News Articles for the Study of County-level News Ecosystems

In this paper, we present a dataset of over 1.4M online news articles fr...

DpgMedia2019: A Dutch News Dataset for Partisanship Detection

We present a new Dutch news dataset with labeled partisanship. The datas...

NewsEdits: A Dataset of Revision Histories for News Articles (Technical Report: Data Processing)

News article revision histories have the potential to give us novel insi...

Sentimental Content Analysis and Knowledge Extraction from News Articles

In web era, since technology has revolutionized mankind life, plenty of ...

Classification of Misinformation in New Articles using Natural Language Processing and a Recurrent Neural Network

This paper seeks to address the classification of misinformation in news...

Do All Good Actors Look The Same? Exploring News Veracity Detection Across The U.S. and The U.K

A major concern with text-based news veracity detection methods is that ...

Who Blames Whom in a Crisis? Detecting Blame Ties from News Articles Using Neural Networks

Blame games tend to follow major disruptions, be they financial crises, ...

Please sign up or login with your details

Forgot password? Click here to reset