Re-Examining Human Annotations for Interpretable NLP

04/10/2022
by   Cheng-Han Chiang, et al.
0

Explanation methods in Interpretable NLP often explain the model's decision by extracting evidence (rationale) from the input texts supporting the decision. Benchmark datasets for rationales have been released to evaluate how good the rationale is. The ground truth rationales in these datasets are often human annotations obtained via crowd-sourced websites. Valuable as these datasets are, the details on how those human annotations are obtained are often not clearly specified. We conduct comprehensive controlled experiments using crowd-sourced websites on two widely used datasets in Interpretable NLP to understand how those unsaid details can affect the annotation results. Specifically, we compare the annotation results obtained from recruiting workers satisfying different levels of qualification. We also provide high-quality workers with different instructions for completing the same underlying tasks. Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions. We further show that specific explanation methods perform better when evaluated using the ground truth rationales obtained by particular instructions. Based on these observations, we highlight the importance of providing complete details of the annotation process and call for careful interpretation of any experiment results obtained using those annotations.

READ FULL TEXT

page 5

page 9

research
12/20/2022

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

The acquisition of high-quality human annotations through crowdsourcing ...
research
03/27/2023

ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks

Many NLP applications require manual data annotations for a variety of t...
research
05/23/2021

Wisdom for the Crowd: Discoursive Power in Annotation Instructions for Computer Vision

Developers of computer vision algorithms outsource some of the labor inv...
research
12/04/2021

In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers

We propose a novel three-stage FIND-RESOLVE-LABEL workflow for crowdsour...
research
02/24/2019

Truth Inference at Scale: A Bayesian Model for Adjudicating Highly Redundant Crowd Annotations

Crowd-sourcing is a cheap and popular means of creating training and eva...
research
09/03/2023

How Crowd Worker Factors Influence Subjective Annotations: A Study of Tagging Misogynistic Hate Speech in Tweets

Crowdsourced annotation is vital to both collecting labelled data to tra...
research
11/08/2019

ERASER: A Benchmark to Evaluate Rationalized NLP Models

State-of-the-art models in NLP are now predominantly based on deep neura...

Please sign up or login with your details

Forgot password? Click here to reset