WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia

04/11/2021
by   Alon Eirew, et al.
0

Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents belonging to the same topic. To complement these resources and enhance future research, we present Wikipedia Event Coreference (WEC), an efficient methodology for gathering a large-scale dataset for cross-document event coreference from Wikipedia, where coreference links are not restricted within predefined topics. We apply this methodology to the English Wikipedia and extract our large-scale WEC-Eng dataset. Notably, our dataset creation method is generic and can be applied with relatively little effort to other Wikipedia languages. To set baseline results, we develop an algorithm that adapts components of state-of-the-art models for within-document coreference resolution to the cross-document setting. Our model is suitably efficient and outperforms previously published state-of-the-art results for the task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2022

Cross-document Event Coreference Search: Task, Dataset and Modeling

The task of Cross-document Coreference Resolution has been traditionally...
research
09/23/2020

Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling

Recent evaluation protocols for Cross-document (CD) coreference resoluti...
research
11/24/2020

Cross-Document Event Coreference Resolution Beyond Corpus-Tailored Systems

Cross-document event coreference resolution (CDCR) is an NLP task in whi...
research
05/24/2010

Distantly Labeling Data for Large Scale Cross-Document Coreference

Cross-document coreference, the problem of resolving entity mentions acr...
research
10/03/2017

Event Identification as a Decision Process with Non-linear Representation of Text

We propose scale-free Identifier Network(sfIN), a novel model for event ...
research
04/28/2020

MAVEN: A Massive General Domain Event Detection Dataset

Event detection (ED), which identifies event trigger words and classifie...
research
04/22/2015

A Hierarchical Distance-dependent Bayesian Model for Event Coreference Resolution

We present a novel hierarchical distance-dependent Bayesian model for ev...

Please sign up or login with your details

Forgot password? Click here to reset