Story Disambiguation: Tracking Evolving News Stories across News and Social Streams

08/16/2018
by   Bichen Shi, et al.
0

Following a particular news story online is an important but difficult task, as the relevant information is often scattered across different domains/sources (e.g., news articles, blogs, comments, tweets), presented in various formats and language styles, and may overlap with thousands of other stories. In this work we join the areas of topic tracking and entity disambiguation, and propose a framework named Story Disambiguation - a cross-domain story tracking approach that builds on real-time entity disambiguation and a learning-to-rank framework to represent and update the rich semantic structure of news stories. Given a target news story, specified by a seed set of documents, the goal is to effectively select new story-relevant documents from an incoming document stream. We represent stories as entity graphs and we model the story tracking problem as a learning-to-rank task. This enables us to track content with high accuracy, from multiple domains, in real-time. We study a range of text, entity and graph based features to understand which type of features are most effective for representing stories. We further propose new semi-supervised learning techniques to automatically update the story representation over time. Our empirical study shows that we outperform the accuracy of state-of-the-art methods for tracking mixed-domain document streams, while requiring fewer labeled data to seed the tracked stories. This is particularly the case for local news stories that are easily over shadowed by other trending stories, and for complex news stories with ambiguous content in noisy stream environments.

READ FULL TEXT
research
04/08/2023

Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding

Unsupervised discovery of stories with correlated news articles in real-...
research
03/01/2018

Growing Story Forest Online from Massive Breaking News

We describe our experience of implementing a news content organization s...
research
03/16/2020

Identifying Notable News Stories

The volume of news content has increased significantly in recent years a...
research
11/13/2020

Cross-Domain Learning for Classifying Propaganda in Online Contents

As news and social media exhibit an increasing amount of manipulative po...
research
10/13/2021

Assisting News Media Editors with Cohesive Visual Storylines

Creating a cohesive, high-quality, relevant, media story is a challenge ...
research
08/02/2022

How UMass-FSD Inadvertently Leverages Temporal Bias

First Story Detection describes the task of identifying new events in a ...
research
01/26/2020

Generating Representative Headlines for News Stories

of news articles are published online every day, which can be overwhelm...

Please sign up or login with your details

Forgot password? Click here to reset