Twitter Referral Behaviours on News Consumption with Ensemble Clustering of Click-Stream Data in Turkish Media

by   Didem Makaroglu, et al.

Click-stream data, which comes with a massive volume generated by the human activities on the websites, has become a prominent feature to identify readers' characteristics by the newsrooms after the digitization of the news outlets. It is essential to have elastic architectures to process the streaming data, particularly for unprecedented traffic, enabling conducting more comprehensive analyses such as recommending mostly related articles to the readers. Although the nature of click-stream data has a similar logic within the websites, it has inherent limitations to recognize human behaviors when looking from a broad perspective, which brings the need of limiting the problem in niche areas. This study investigates the anonymized readers' click activities in the organizations' websites to identify news consumption patterns following referrals from Twitter, who incidentally reach but propensity is mainly the routed news content. The investigation is widened to a broad perspective by linking the log data with news content to enrich the insights rather than sticking into the web journey. The methodologies on ensemble cluster analysis with mixed-type embedding strategies are applied and compared to find similar reader groups and interests independent from time. Our results demonstrate that the quality of clustering mixed-type data set approaches to optimal internal validation scores when embedded by Uniform Manifold Approximation and Projection (UMAP) and using consensus function as a key to access the most applicable hyper parameter configurations in the given ensemble rather than using consensus function results directly. Evaluation of the resulting clusters highlights specific clusters repeatedly present in the samples, which provide insights to the news organizations and overcome the degradation of the modeling behaviors due to the change in the interest over time.


page 6

page 14


Online News Media Website Ranking Using User Generated Content

News media websites are important online resources that have drawn great...

Simplifying Multilingual News Clustering Through Projection From a Shared Space

The task of organizing and clustering multilingual news articles for med...

HoaxItaly: a collection of Italian disinformation and fact-checking stories shared on Twitter in 2019

We released over 1 million tweets shared during 2019 and containing link...

Predicting Factuality of Reporting and Bias of News Media Sources

We present a study on predicting the factuality of reporting and bias of...

Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and Telegram

In response to disinformation and propaganda from Russian online media f...

A Scalable and Robust Framework for Data Stream Ingestion

An essential part of building a data-driven organization is the ability ...

Please sign up or login with your details

Forgot password? Click here to reset