Automated Evaluation of Out-of-Context Errors

03/23/2018
by   Patrick Huber, et al.
0

We present a new approach to evaluate computational models for the task of text understanding by the means of out-of-context error detection. Through the novel design of our automated modification process, existing large-scale data sources can be adopted for a vast number of text understanding tasks. The data is thereby altered on a semantic level, allowing models to be tested against a challenging set of modified text passages that require to comprise a broader narrative discourse. Our newly introduced task targets actual real-world problems of transcription and translation systems by inserting authentic out-of-context errors. The automated modification process is applied to the 2016 TEDTalk corpus. Entirely automating the process allows the adoption of complete datasets at low cost, facilitating supervised learning procedures and deeper networks to be trained and tested. To evaluate the quality of the modification algorithm a language model and a supervised binary classification model are trained and tested on the altered dataset. A human baseline evaluation is examined to compare the results with human performance. The outcome of the evaluation task indicates the difficulty to detect semantic errors for machine-learning algorithms and humans, showing that the errors cannot be identified when limited to a single sentence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2018

A Hierarchical Approach to Neural Context-Aware Modeling

We present a new recurrent neural network topology to enhance state-of-t...
research
06/20/2016

The LAMBADA dataset: Word prediction requiring a broad discourse context

We introduce LAMBADA, a dataset to evaluate the capabilities of computat...
research
04/15/2022

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Existing Chinese text error detection mainly focuses on spelling and sim...
research
04/06/2023

Large language models effectively leverage document-level context for literary translation, but critical errors persist

Large language models (LLMs) are competitive with the state of the art o...
research
11/09/2017

Large-scale Cloze Test Dataset Designed by Teachers

Cloze test is widely adopted in language exams to evaluate students' lan...
research
12/17/2014

Effective sampling for large-scale automated writing evaluation systems

Automated writing evaluation (AWE) has been shown to be an effective mec...
research
03/30/2020

How human judgment impairs automated deception detection performance

Background: Deception detection is a prevalent problem for security prac...

Please sign up or login with your details

Forgot password? Click here to reset