Annotating Antisemitic Online Content. Towards an Applicable Definition of Antisemitism
Online antisemitism is hard to quantify. How can it be measured in rapidly growing and diversifying platforms? Are the numbers of antisemitic messages rising proportionally to other content or is it the case that the share of antisemitic content is increasing? How does such content travel and what are reactions to it? How widespread is online Jew-hatred beyond infamous websites and fora, and closed social media groups? However, at the root of many methodological questions is the challenge of finding a consistent way to identify diverse manifestations of antisemitism in large datasets. What is more, a clear definition is essential for building an annotated corpus that can be used as a gold standard for machine learning programs to detect antisemitic online content. We argue that antisemitic content has distinct features that are not captured adequately in generic approaches of annotation, such as hate speech, abusive language, or toxic language. We discuss our experiences with annotating samples from our dataset that draw on a ten percent random sample of public tweets from Twitter. We show that the widely used definition of antisemitism by the International Holocaust Remembrance Alliance can be applied successfully to online messages if inferences are spelled out in detail and if the focus is not on intent of the disseminator but on the message in its context. However, annotators have to be highly trained and knowledgeable about current events to understand each tweet's underlying message within its context. The tentative results of the annotation of two of our small but randomly chosen samples suggest that more than ten percent of conversations on Twitter about Jews and Israel are antisemitic or probably antisemitic. They also show that at least in conversations about Jews, an equally high number of tweets denounce antisemitism, although these conversations do not necessarily coincide.
READ FULL TEXT