TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

02/20/2018
by   Tirthankar Ghosal, et al.
0

Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic novelty detection techniques in a classification framework. To bridge this gap, we present here a resource for benchmarking the techniques for document level novelty detection. We create the resource via event-specific crawling of news documents across several domains in a periodic manner. We release the annotated corpus with necessary statistics and show its use with a developed system for the problem in concern.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2017

Detecting (Un)Important Content for Single-Document News Summarization

We present a robust approach for detecting intrinsic sentence importance...
research
11/19/2016

Spotting Rumors via Novelty Detection

Rumour detection is hard because the most accurate systems operate retro...
research
09/11/2019

How to detect novelty in textual data streams? A comparative study of existing methods

Since datasets with annotation for novelty at the document and/or word l...
research
05/10/2021

Word-level Human Interpretable Scoring Mechanism for Novel Text Detection Using Tsetlin Machines

Recent research in novelty detection focuses mainly on document-level cl...
research
04/30/2016

An Improved System for Sentence-level Novelty Detection in Textual Streams

Novelty detection in news events has long been a difficult problem. A nu...
research
10/14/2022

Evaluating Out-of-Distribution Performance on Document Image Classifiers

The ability of a document classifier to handle inputs that are drawn fro...
research
11/24/2020

Cross-Document Event Coreference Resolution Beyond Corpus-Tailored Systems

Cross-document event coreference resolution (CDCR) is an NLP task in whi...

Please sign up or login with your details

Forgot password? Click here to reset