Re-TACRED: Addressing Shortcomings of the TACRED Dataset

by   George Stoica, et al.

TACRED is one of the largest and most widely used sentence-level relation extraction datasets. Proposed models that are evaluated using this dataset consistently set new state-of-the-art performance. However, they still exhibit large error rates despite leveraging external knowledge and unsupervised pretraining on large text corpora. A recent study suggested that this may be due to poor dataset quality. The study observed that over 50 challenging sentences from the development and test sets are incorrectly labeled and account for an average drop of 8 However, this study was limited to a small biased sample of 5k (out of a total of 106k) sentences, substantially restricting the generalizability and broader implications of its findings. In this paper, we address these shortcomings by: (i) performing a comprehensive study over the whole TACRED dataset, (ii) proposing an improved crowdsourcing strategy and deploying it to re-annotate the whole dataset, and (iii) performing a thorough analysis to understand how correcting the TACRED annotations affects previously published results. After verification, we observed that 23.9 evaluating several models on our revised dataset yields an average f1-score improvement of 14.3 different models (rather than simply offsetting or scaling their scores by a constant factor). Finally, aside from our analysis we also release Re-TACRED, a new completely re-annotated version of the TACRED dataset that can be used to perform reliable evaluation of relation extraction models.


page 1

page 2

page 3

page 4


A Two-step Approach for Handling Zero-Cardinality in Relation Extraction

Relation tuple extraction from text is an important task for building kn...

TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task

TACRED (Zhang et al., 2017) is one of the largest, most widely used crow...

About Evaluation of F1 Score for RECENT Relation Extraction System

This document contains a discussion of the F1 score evaluation used in t...

Relation Extraction with Explanation

Recent neural models for relation extraction with distant supervision al...

DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction

Distant supervision (DS) is a well established technique for creating la...

What do You Mean by Relation Extraction? A Survey on Datasets and Study on Scientific Relation Classification

Over the last five years, research on Relation Extraction (RE) witnessed...

Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction

The robustness to distribution changes ensures that NLP models can be su...

Please sign up or login with your details

Forgot password? Click here to reset