Quantifying the dynamics of topical fluctuations in language

by   Andres Karjus, et al.

The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in the texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide variety of reasons, including purely random sampling effects, or because corpora are composed of contemporary media and fiction texts within which the underlying topics ebb and flow with cultural and socio-political trends. In this work, we introduce a computationally simple model for controlling for topical fluctuations in corpora - the topical-cultural advection model - and demonstrate how it provides a robust baseline of variability in word frequency changes over time. We validate the model on a diachronic corpus spanning two centuries, and a carefully-controlled artificial language change scenario, and then use it to correct for topical fluctuations in historical time series. Finally, we show that the model can be used to show that emergence of new words typically corresponds with the rise of a trending topic. This suggests that some lexical innovations occur due to growing communicative need in a subspace of the lexicon, and that the topical-cultural advection model can be used to quantify this.


page 1

page 2

page 3

page 4


Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change

Understanding how words change their meanings over time is key to models...

Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change

Words shift in meaning for many reasons, including cultural factors like...

Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution

It is tempting to treat frequency trends from the Google Books data sets...

Self-contained Beta-with-Spikes Approximation for Inference Under a Wright-Fisher Model

We construct a reliable estimation of evolutionary parameters within the...

Computational Paremiology: Charting the temporal, ecological dynamics of proverb use in books, news articles, and tweets

Proverbs are an essential component of language and culture, and though ...

A fully data-driven method to identify (correlated) changes in diachronic corpora

In this paper, a method for measuring synchronic corpus (dis-)similarity...

Please sign up or login with your details

Forgot password? Click here to reset