Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization

12/16/2022
by   Zhenyuan Lu, et al.
0

Machine learning models can reach high performance on benchmark natural language processing (NLP) datasets but fail in more challenging settings. We study this issue when a pre-trained model learns dataset artifacts in natural language inference (NLI), the topic of studying the logical relationship between a pair of text sequences. We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference (SNLI) corpus. We study the stylistic pattern of dataset artifacts in the SNLI. To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks: a behavioral testing checklist at the sentence level and lexical synonym criteria at the word level. Specifically, our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2023

Augmenting NLP data to counter Annotation Artifacts for NLI Tasks

In this paper, we explore Annotation Artifacts - the phenomena wherein l...
research
04/06/2022

DAGAM: Data Augmentation with Generation And Modification

Text classification is a representative downstream task of natural langu...
research
04/03/2023

MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model

In natural language processing, pre-trained language models have become ...
research
05/19/2022

Transformers as Neural Augmentors: Class Conditional Sentence Generation via Variational Bayes

Data augmentation methods for Natural Language Processing tasks are expl...
research
10/04/2021

The state-of-the-art in text-based automatic personality prediction

Personality detection is an old topic in psychology and Automatic Person...
research
09/10/2019

Mitigating Annotation Artifacts in Natural Language Inference Datasets to Improve Cross-dataset Generalization Ability

Natural language inference (NLI) aims at predicting the relationship bet...
research
10/19/2019

MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity

We present a new logic-based inference engine for natural language infer...

Please sign up or login with your details

Forgot password? Click here to reset